key: cord- -g hy ncm authors: grobler, chris d.; van der merwe, thomas m. title: towards a strategic model for safeguarding the preservation of business value during human interactions with information systems date: - - journal: responsible design, implementation and use of information and communication technology doi: . / - - - - _ sha: doc_id: cord_uid: g hy ncm this paper considers the dichotomy inherent in information systems where its introduction, for the purposes of creating new or sustaining existing business value, subsequently also inadvertently or deliberately dissipates value. we investigate root people-induced causes, delineated within a rudimentary conceptual technology value framework. to support a qualitative investigation, the framework is forthwith applied as the basis for a series of interviews within a major south african financial institution operating within the disciplines of information technology, business operations and organisational development. the constructs identified are discussed and find gestalt in an adjusted technology value model which can be used to safeguard business value against destructive hci behaviors. a topic that has been debated for many years without a clear resolution is the dualistic nature of an information system (is) impacting on business value, where the adoption and use of an is in an organisation as an explicit value creator, also brings about the destruction of business value [ ] . in adopting a slightly dystopic view, our focus in this paper is seated within the context of the potentially negative impact that end-users have on organisations when discontinuing the use of a particular mandated is [ ] , or making misuse of information within an is that is intended to drive value realisation [ , ] . of the numerous studies [e.g. , ] that have placed specific focus on the interactive relationships between humans and computers (hci) and endeavoured to explain how these relationships contribute positively towards organisational objectives, none have attempted to expressly illuminate the phenomenon where human agents erode information technology enabled benefits through the consumption of is. although the primary cause of organisational value erosion was identified by several authors [e.g. , ] to be the human agent, again, none of these authors attempted to articulate the actual is related behavioural activities or actions executed by the human agent contributing directly to business or organisational value erosion. this is the issue that this paper seeks to investigate. the primary research question this paper asks is: how can business value be safeguarded against destructive hci behaviors? the primary purpose is to build a value framework from which an empirically-endorsed model can be constructed, and through which the unintended business value dissipating effects on institutions, as a direct result of end-user's misuse of is, may be investigated and moderated. three secondary objectives that dictate the structure of this paper are pursued: ( ) to review key characteristics from several germane models and theories relating to the business impact of hci that maps to, and refines, a rudimentary conceptual technology value framework (ctvf), ( ) to apply the ctvf as a basis for a qualitative investigation from which an adjusted technology value model (atvm) may be derived and contextualized, and ( ) to present the atvm as a first benchmark to identify, investigate, mitigate and minimise or eliminate unintentional value destroying effects. in construction of our ctvf, key characteristics from selected germane models and theories relating to the negative business impact of hci as they relate to our own professional experiences as end-users and is practitioners were considered. here it is noted that the relationship between is use and performance is complex and therefore invites multiple theoretical approaches. our approach is mainly inductive i.e. the ctvf we propose is not intended to be fully grounded in the literature. we simply chose to follow a bottom-up approach while maintaining a minimum level of theoretical sensitivity. an extensive review of the literature yielded the following three models that contained elements of both hci and delinquent employee behavior which in turn provided the conceptual constructs framing the ctvf delineated in fig. . task-technology fit (ttf): the model [ ] asserts that for technology to have a positive impact on a user's performance, is user utilisation is required, while an alignment between the characteristics of the task that the user must perform, and the technology needs to exist. in our view, the ttf supports a phenomenon where users may unintentionally misuse an is if some form of misalignment between the user, enabling technology and the task that the user must perform, exists. lazy user theory (lut): the theory [ ] moves from the premise that in fulfilling a user need, he will be biased towards those solutions that are perceived as most suitable and usable based on the lowest level of effort. the lut presents a theoretical situation where a user, partial to a legacy is, will favour the use of said system above that of a newly introduced system, thereby passively disusing the new system, regardless of its utility. agency theory (at): when a self-interested (agent) individual is requested (by a principal) to perform a specific task, he will be motivated by three different conditions: he will perform the task because he is forced to, or he knows he must, or he wants to [ ] . at potentially considers the phenomenon where a self-interested employee will endeavour to actively abuse the company's is for personal gain, and/or, in extreme cases, intentionally sabotage the assets of an organisation to achieve some selfinterested objective. from the succinct review of the model/theories it is evident that the benefits realisation of is usage is fundamentally informed by the actions and behaviours of individuals within the organisational context. within the context of systems thinking, the structure of a system is constituted by the systemic interrelationships between feedback loops, concluding that said structure constitute the primary driver for a system's behaviour [ ] . within our ctvf, two feedback loops are proposed. the degree of control loop attempts to control both quiescent and recalcitrant user behaviour during system usage, while the degree of influence loop endeavours to influence user beliefs, attitudes, and intention, towards correct and optimal system use. in fig. , moving from left to right, the ctvf constructs are described as follows: behavioural beliefs, behavioural attitude and behavioural intention will be applied as in the wixom and todd research model [ ] , the latter which ties constructs from the user satisfaction and technology acceptance literature into a single research model. next, unintentional misuse and passive disuse are both assumed to possess quiescent qualities. the unintentional misuse construct denotes actual behaviour where the user is misapplying the system, either consciously or unconsciously, due to a lack of skill or negligence. in contrast, passive disuse can be described as a user's passive-aggressive attitude towards having to use a system, causing the user to avoid interaction with said system. the two recalcitrant value eroding behaviour constructs describe a more sinister scenario. active abuse encompasses situations where a user determinedly employs the system for personal gain or to perform unauthorised transactions. finally, intentional sabotage designates the purposeful disruption or damage to a system by a disgruntled user. the outcomes of each of the actual value eroding behaviour constructs is summated into the inherent value eroded determinate which is a precursor to the mitigation gate. the latter mediates between the inherent value eroded and the residual value eroded as it attempts to moderate undesirable actioned behaviour the data collection process comprised semi-structured interviews with a convenience sample of professional and experienced employees at a major south african financial institution operating within the disciplines of information technology, business operations and organisational development. semi-structured interviews required participants to deliberate on the ctvf constructs, their validity, significance, rankings, interrelationship and impact management. a descriptive content analysis method [ ] was applied to extract themes and contradictions within the data. table presents the results of our metadata analysis. it is evident that, on average, the participants used phrases and terms specific to the four value-eroding behaviours most, followed by the three behavioural constructs and the four factors comprising the mitigation of value erosion. the concept of value dissipation also scored highly, as did various behavioural relationships. value eroding potential returned mid-range totals. by and large, participants concurred with the statement that the introduction of is may not only create value for, but likewise inadvertently dissipate value from organisations. participants also consistently referred to value erosion as a by-product of value creation especially within the areas of unintentional misuse and passive disuse. unintentional misuse: users may simply not be aware that they are engaging unintentional misuse: "…if you don't know what you're doing is wrong, it means in your mind what you're doing is correct and it's appropriate. so it goes hand in hand with unintentional misuse". (participant ). unintentional misuse was extended to cases where individuals do not make optimal use of a system e.g. front-line completing only mandatory fields, with the role of management in correcting unintentional misuse considered important. passive disuse: passive disuse is perceived to be destroying value: "…that destroys value because, immediately what you have is, you have double work and you also have something that you've paid for that's not being used, so you're effectively wasting a license. so, that is definitely also dissipating value". (participant ). a shared consensus prevailed that passive disuse introduced numerous instances of complexity and undesirable noise into the overall is landscape with time pressure identified as a contributing factor to users returning to familiar legacy systems. individual passive disuse is also viewed a precursor to team passive disuse, where individuals rationalise improper behaviour and ultimately tend to share workarounds with their colleagues. system controls and managerial superintendence were identified as the most effective counter measures. active abuse: except for agreement on the pervasive nature of active abuse, participants did not agree on the extent to which active abuse eroded business value. active abuse and intentional sabotage were perceived to be reinforcing constructs that: "… feed each other". (participant ). some users may perceive themselves to be selfappointed end-user testers of production systems, and through actions of unsolicited active abuse create awareness of weaknesses and inefficiencies in a particular is. intentional sabotage: no evident pattern emerged. despite the improbability of intentional sabotage, many agreed that it could possibly cause the greatest harm. one participant noted that a user's deviant belief system developed: "…when people's behavioural beliefs don't align with the values of the organisation". (participant ). two participants highlighted the possibility that some users may be sabotaging systems with good intent, i.e. to draw attention to problems embedded in systems. interrelationship between the four constructs: while to most participants the relationships between respectively the two quiescent behaviours and the two recalcitrant behaviours were clear, not all agreed on the existence of potential relationships crossing over between quiescent and recalcitrant constructs. degree of control: participants described control measures as being useful in the prevention, detection and correction of undesirable behaviour, but of little value in addressing individuals' beliefs, attitudes and intentions. preventative controls were perceived to be more desirous as well as managerial oversight and the examples they set. degree of influence: of paramount importance, executive leadership should positively influence the moral values of employees, to cascade down to every end-user, and which will marginalise individuals with corrupted belief systems, attitudinal problems or malicious intention. "the technology is important but without the users to drive the systems, and effectively leaders to guide the users, the unfortunate outcome would be a failed is". (interviewee ). while all the framework constructs were qualitatively endorsed, various arguments exist for, and against relationships, or not. comparing the ctvf (fig. ) to the proposed atvm (fig. ) , it is evident that participants were not in agreement as to the flow of the former, the most common view suggesting that while there appears to be a tendency for behavioural beliefs and behavioural attitudes to display a closer relationship with the quiescent behaviours, and behavioural intention, in turn, to display a closer relationship with the recalcitrant behaviours, ultimately, any one may function as a precursor to any one of the four value eroding behaviours. moving on to the relationships between the four value-eroding behaviours, several participants argued against the existence of any kind of interrelationship between the constructs while others provided unique examples of instances where a specific primary behaviour could trigger a secondary behaviour. while all four the mitigating constructs were perceived to be valid, the two degree of control constructs, were seen to be more effective in mitigating value eroding behaviour, while the two degree of influence constructs were seen to be less effective yet not as costly in the prevention of behaviours that destroyed business value. the results from the primary research and ensuing atvm are consistent with the updated delone and mclean model [ ] in that both utilization and user attitudes fig. . adjusted technology value model toward technology were shown to be important. the research also supported the intention to use construct as it further elucidated the behavioural intent of end-users, occasioning is abuse. the research furthermore confirmed the problem of increased organisational spent on it with little realisation or insufficient justification on how, why and when is investments create business value [ , ] . in a similar vein, the research supported the literature by explicating the continued challenge that exists within organisations to measure and communicate it value, noting that while many it metrics measure performance, they do not measure actual value [ ] . the investigation also confirmed the contributions made by several authors maintaining that the primary challenges experienced by technology driven organisations lied with the human element [e.g. , ] . in concluding, the atvf provides a clear articulation of the actual is related behavioural activities or actions executed by the human agent contributing directly to business or organisational value erosion and offers a model on how business value can be safeguarded against destructive hci behaviors. future studies should focus on refining and validating the proposed atvm. the information technology interaction model: a foundation for the mba core course understanding information systems continuance: an expectation confirmation model information asymmetry in information systems consulting: toward a theory of relationship constraints agency theory (no. smg wp / the effects of technological and organizational innovations on business value in the context of a new product or service deployment review: information technology and organizational performance: an integrative model of it business value the new productivity paradox managing for value: it's not just about the numbers task-technology fit and individual performance lazy user theory: a dynamic model to understand user selection of products and services the impact of information systems on organizations and markets the relationship of 'systems thinking' to action research a theoretical integration of user satisfaction and technology acceptance researching information systems and computing the updated delone and mclean model of information systems success how it creates business value: a process theory synthesis revisiting is business value research: what we already know, what we still need to know, and how we can get there measuring it performance and communicating value key: cord- -u f kvg authors: broeck, wouter van den; gioannini, corrado; gonçalves, bruno; quaggiotto, marco; colizza, vittoria; vespignani, alessandro title: the gleamviz computational tool, a publicly available software to explore realistic epidemic spreading scenarios at the global scale date: - - journal: bmc infect dis doi: . / - - - sha: doc_id: cord_uid: u f kvg background: computational models play an increasingly important role in the assessment and control of public health crises, as demonstrated during the h n influenza pandemic. much research has been done in recent years in the development of sophisticated data-driven models for realistic computer-based simulations of infectious disease spreading. however, only a few computational tools are presently available for assessing scenarios, predicting epidemic evolutions, and managing health emergencies that can benefit a broad audience of users including policy makers and health institutions. results: we present "gleamviz", a publicly available software system that simulates the spread of emerging human-to-human infectious diseases across the world. the gleamviz tool comprises three components: the client application, the proxy middleware, and the simulation engine. the latter two components constitute the gleamviz server. the simulation engine leverages on the global epidemic and mobility (gleam) framework, a stochastic computational scheme that integrates worldwide high-resolution demographic and mobility data to simulate disease spread on the global scale. the gleamviz design aims at maximizing flexibility in defining the disease compartmental model and configuring the simulation scenario; it allows the user to set a variety of parameters including: compartment-specific features, transition values, and environmental effects. the output is a dynamic map and a corresponding set of charts that quantitatively describe the geo-temporal evolution of the disease. the software is designed as a client-server system. the multi-platform client, which can be installed on the user's local machine, is used to set up simulations that will be executed on the server, thus avoiding specific requirements for large computational capabilities on the user side. conclusions: the user-friendly graphical interface of the gleamviz tool, along with its high level of detail and the realism of its embedded modeling approach, opens up the platform to simulate realistic epidemic scenarios. these features make the gleamviz computational tool a convenient teaching/training tool as well as a first step toward the development of a computational tool aimed at facilitating the use and exploitation of computational models for the policy making and scenario analysis of infectious disease outbreaks. the h n influenza pandemic highlighted the importance of computational epidemic models for the real-time analysis of the health emergency related to the global spreading of new emerging infectious diseases [ ] [ ] [ ] . realistic computational models are highly complex and sophisticated, integrating substantial amounts of data that characterize the population and geographical context in order to attain superior accuracy, resolution, and predictive power [ ] [ ] [ ] [ ] [ ] [ ] [ ] . the challenge consists in developing models that are able to capture the complexity of the real world at various levels by taking advantage of current information technology to provide an in silico framework for testing control scenarios that can anticipate the unfolding of an epidemic. at the same time, these computational approaches should be translated into tools accessible by a broader set of users who are the main actors in the decision-making process of health policy, especially during an emergency like an influenza pandemic. the tradeoff between realistic and accurate descriptions of large-scale dynamics, flexibility, computational feasibility, ease of use, and accessibility of these tools creates a major challenge from both the theoretical and the computational points of view [ , , , , , ] . gleamviz is a client-server software system that can model the world-wide spread of epidemics for human transmissible diseases like influenzalike illnesses (ili), offering extensive flexibility in the design of the compartmental model and scenario setup, including computationally-optimized numerical simulations based on high-resolution global demographic and mobility data. gleamviz makes use of a stochastic and discrete computational scheme to model epidemic spread called "gleam" -global epidemic and mobility model, presented in previously published work [ , , ] which is based on a geo-referenced metapopulation approach that considers , subpopulations in countries of the world, as well as air travel flow connections and short-range commuting data. the software includes a client application with a graphical user interface (gui) for setting up and executing simulations, and retrieving and visualizing the results; the client application is publicly downloadable. the server application can be requested by public institutions and research centers; conditions of use and possible restrictions will be evaluated specifically. the tool is currently not suitable for the simulation of vector-borne diseases, infection transmission depending on local contact patterns such as sexually transmitted diseases and diseases with a time scale that would make demographic effects relevant. the tool, however, allows the introduction of mitigation policies at the global level. localized intervention in space or time can be implemented in the gleam model and their introduction in the gleamviz computational tool are planned for future releases. only a few computational tools are currently available to the public for the analysis and modeling of epidemics. these range from very simple spreadsheet-based models aimed at providing quick estimates for the number of patients and hospitalizations during a pandemic (see e.g. flusurge [ ] ) to more complicated tools based on increasingly sophisticated simulation approaches [ , , , , , ] . these tools differ in their underlying modeling approaches and in the implementation, flexibility, and accessibility of the software itself. influsim is a tool that provides a visual interface to simulate an epidemic with a deterministic compartmental model in a single population [ ] . the model includes age structure and explicit sojourn times with different stages in each compartment, extending an seir compartmentalization to include hospitalizations and intervention measures. the software provides the infectious disease dynamics and the user can set parameter values and add or remove interventions. however, no spatial structure or other forms of heterogeneity and stochasticity are considered in the model. on the other hand agent-based models describe the stochastic propagation of a disease at the individual level, thus taking into account the explicit social and spatial structure of the population under consideration. communityflu is a software tool that simulates the spread of influenza in a structured population of approximately , households with , persons [ ] . user interaction with the software is limited to the spreadsheet portion of the program, where one can choose the type of intervention and other parameters describing the disease and the population. a larger population is considered in flute, a publicly available tool for the stochastic simulation of an epidemic in the united states at the level of individuals [ ] . the model is based on a synthetic population, structured in a hierarchy of mixing social groups including households, household clusters, neighborhoods, and nation-wide communities. flute comes with a configuration file in text format that can be modified by an expert user to set various parameters such as the initiation of the epidemic, the reproductive number, and the interventions considered. no gui is provided, and the output of the simulations is given in the form of text files that must be analyzed through additional software. epifast involves a parallel algorithm implemented using a master-slave approach which allows for scalability on distributed memory systems, from the generation of synthetic population aggregated in mixing groups to the explicit representation of the contact patterns between individuals as they evolve in time [ ] . the epi-fast tool allows for the detailed representation and simulation of the disease on social contact networks among individuals that dynamically evolve in time and adapt to actions taken by individuals and public health interventions. the algorithm is coupled with a webbased gui and the middleware system didactic, which allows users to specify the simulation setup, execute the simulation, and visualize the results via plots. epidemic models and interventions are pre-configured, and the system can scale up to simulate a population of a large metropolitan area on the order of tens of millions of inhabitants. another class of models focuses on the global scale, by using a metapopulation approach in which the population is spatially structured into patches or subpopulations (e.g. cities) where individuals mix. these patches are connected by mobility patterns of individuals. in this vein two tools are currently available. the global epidemic model (gem) uses a metapopulation approach based on an airline network comprised of major metropolitan areas in the world for the stochastic simulation of an influenza-like illness [ ] . the tool consists of a java applet in which the user can simulate a hypothetical h n outbreak and test pre-configured intervention strategies. the compartmentalization is set to an seir model, and the parameterization can be modified in the full or stand-alone mode, but not currently in the java applet. the spatiotemporal epidemiological modeler (stem) is a modeling system for the simulation of the spread of an infectious disease in a spatially structured population [ ] . contrary to other approaches, stem is based on an extensible software platform, which promotes the contribution of data and algorithms by users. the resulting framework therefore merges datasets and approaches and its detail and realism depend on continuous developments and contributions. however, these are obtained from a variety of sources and are provided in different formats and standards, thus resulting in possible problems related to the integration and merging of datasets. such issues are left to the user to resolve. the existing tools described above thus offer the opportunity to use highly sophisticated data-driven approaches at the expense of flexibility and ease of use by non-experts on the one hand, or very simplified models with user-friendly guis and no specific computational requirements on the other. our approach aims at optimizing the balance of complex and sophisticated data-driven epidemic modeling at the global scale while maintaining an accessible computational speed and overall flexibility in the description of the simulation scenario, including the compartmental model, transition rates, intervention measures, and outbreak conditions by means of a user-friendly gui. in the gleamviz tool the setup of the simulations is highly flexible in that the user can design arbitrary disease compartmental models, thus allowing an extensive range of human-to-human infectious diseases and intervention strategies to be considered. the user interface has been designed in order to easily define both features specific to each compartment, such as the mobility of classes of individuals, and general environmental effects, such as seasonality for diseases like influenza. in addition, the user can define the initial settings that characterize the initial geographical and temporal conditions, the immunity profile of the population, and other parameters including but not limited to: the definition of an outbreak condition in a given country; the number of stochastic runs to be performed; and the total duration of each simulation. the tool allows the production of global spreading scenarios with geographical high resolution by just interacting with the graphic user interface. while an expert input would be required to interpret and discuss the results obtained with the software, the present computational platform facilitates the generation and analysis of scenarios from intensive data-driven simulations. the tool can be deployed both in training activities as well as to facilitate the use of large-scale computational modeling of infectious diseases in the discussion between modelers and public health stakeholders. the paper is organized as follows. the "implementation" section describes the software application architecture and its major components, including the computational model gleam. the "results and discussion" section presents in detail the gleamviz client and its components that allow for software-user interaction, including an application of the simulator to an influenza-like-illness scenario. the top-level architecture of the gleamviz tool comprises three components: the gleamviz client application, the gleamviz proxy middleware, and the simulation engine. the latter two components constitute the gleamviz server, as shown in figure . users interact with the gleamviz system by means of the client application, which provides graphical userinterfaces for designing and managing the simulations, as well as visualizing the results. the clients, however, do not themselves run the simulations. instead they establish a connection with the gleamviz proxy middleware to request the execution of a simulation by the server. multiple clients can use the same server concurrently. upon receipt of requests to run a simulation, the middleware starts the simulation engine instances required to execute the requests and monitors their status. once the simulations are completed, the gleamviz proxy middleware collects and manages the resulting simulation data to be served back to the clients. a schematic diagram of the workflow between client and server is shown in figure . this client-server model allows for full flexibility in its deployment; the client and server can be installed on the same machine, or on different machines connected by a local area network or the internet. the two-part decomposition of the server in terms of middleware and engines additionally allows for advanced high-volume setups in which the middleware server distributes the engine instances over a number of machines, such as those in a cluster or cloud. this architecture thus ensures high speed in large-scale simulations and does not rely on the cpu-specific availability accessible by the user. the gleamviz simulation engine uses a stochastic metapopulation approach [ ] [ ] [ ] , [ ] [ ] [ ] ] that considers data-driven schemes for the short-range and design the compartmental model of the infectious disease in the model builder. configure the simulation of the world-wide epidemic spreading in the simulation wizard. submit the simulation for execution by the engine on the server. inspect the results of a simulation in the interactive visualization. inspect all simulations and retrieve results in the simulations history. long-range mobility of individuals at the inter-population level, coupled with coarse-grained techniques to describe the infection dynamics within each subpopulation [ , ] . the basic mechanism for epidemic propagation occurs at multiple scales. individuals interact within each subpopulation and may contract the disease if an outbreak is taking place in that subpopulation. by travelling while infected, individuals can carry the pathogen to a non-infected region of the world, thus starting a new outbreak and shaping the spatial spread of the disease. the basic structure of gleam consists of three distinct layers -the population layer, the mobility layer, and the epidemic layer (see figure ) [ , ] . the population layer is based on the high-resolution population database of the gridded population of the world project by the socio-economic data and applications center (sedac) [ ] that estimates population with a granularity given by a lattice of cells covering the whole planet at a resolution of × minutes of arc. the mobility layer integrates short-range and longrange transportation data. long-range air travel mobility is based on travel flow data obtained from the international air transport association (iata [ ]) and the official airline guide (oag [ ] ) databases, which contain the list of worldwide airport pairs connected by direct flights and the number of available seats on any given connection [ ] . the combination of the population and mobility layers allows for the subdivision of the world into geo-referenced census areas obtained by a voronoi tessellation procedure around transportation hubs. these census areas define the subpopulations of the metapopulation modeling structure, identifying , subpopulations centered on iata airports in different countries. the model simulates the mobility of individuals between these subpopulations using a stochastic procedure defined by the airline transportation data [ ] . short-range mobility considers commuting patterns between adjacent subpopulations based on data collected and analyzed from more than countries in continents across the world [ ] . it is modeled with a time-scale separation approach that defines the effective force of infections in connected subpopulations [ , , ] . on top of the population and mobility layers lies the epidemic layer, which defines the disease and population dynamics. the infection dynamics takes place within each subpopulation and assumes a compartmentalization [ ] that the user can define according to the infectious disease under study and the intervention measures being considered. all transitions between compartments are modeled through binomial and multinomial processes to preserve the discrete and stochastic nature of the processes. the user can also specify the initial outbreak conditions that characterize the spreading scenario under study, enabling the seeding of the epidemic in any geographical census area in the world and defining the immunity profile of the population at initiation. seasonality effects are still an open problem in the transmission of ili diseases. in order to include the effect of seasonality on the observed pattern of ili diseases, we use a standard empirical approach in which population layer short-range mobility layer long-range mobility layer the short-range mobility layer covers commuting patterns between adjacent subpopulations based on data collected and analyzed from more than countries on continents across the world, modeled with a time-scale separation approach that defines the effective force of infections in connected subpopulations. the long-range mobility layer covers the air travel flow, measured in available seats between worldwide airport pairs connected by direct flights. seasonality is modeled by a forcing that reduces the basic reproductive number by a factor α min ranging from . to (no seasonality) [ ] . the forcing is described by a sinusoidal function of months-period that reaches its peak during winter time and its minimum during summer time in each hemisphere, with the two hemispheres with opposite phases. given the population and mobility data, infection dynamics parameters, and initial conditions, gleam performs the simulation of stochastic realizations of the worldwide unfolding of the epidemic. from these in silico epidemics a variety of information can be gathered, such as prevalence, morbidity, number of secondary cases, number of imported cases, hospitalized patients, amounts of drugs used, and other quantities for each subpopulation with a time resolution of day. gleam has been under continuous development since and during these years it has been used: to assess the role of short-range and long-range mobility in epidemic spread [ , , ] ; to retrospectively analyze the sars outbreak of - in order to investigate the predictive power of the model [ ] ; to explore global health strategies for controlling an emerging influenza pandemic with pharmaceutical interventions under logistical constraints [ ] ; and more recently to estimate the seasonal transmission potential of the h n influenza pandemic during the early phase of the outbreak to provide predictions for the activity peaks in the northern hemisphere [ , ] . the gleamviz simulation engine consists of a core that executes the simulations and a wrapper that prepares the execution based on the configuration relayed from the client by the gleamviz proxy middleware. the engine can perform either single-run or multi-run simulations. the single-run involves only a single stochastic realization for a given configuration setup and a random seed. the multi-run simulation involves a number of stochastic realizations as set by the user and performed by the core (see the following section), each with the same configuration but with a different random seed. the results of the multi-run simulation are then aggregated and statistically analyzed by the wrapper code. the simulation engine writes the results to files and uses lock files to signal its status to the middleware component. the core is written in c++, resulting in a fast and efficient engine that allows the execution of a single stochastic simulation of a -year epidemic with a standard seir model in a couple of minutes on a high-end desktop computer. the wrapper code is written in python [ ] . the server components can be installed on most unix-like operating systems such as linux, bsd, mac os x, etc. the gleamviz proxy middleware is the server component that mediates between clients and simulation engines. it accepts tcp connections from clients and handles requests relayed over these connections, providing client authorization management. a basic access control mechanism is implemented that associates a specific client with the simulations it launches by issuing a private simulation identifier key upon submission. users can only retrieve the results of the simulations they launched, or simulations for which they have obtained the simulation definition file -containing the private simulation identifier key-from the original submitter. upon receipt of a request to execute a simulation, the middleware sets up the proper system environment and then launches an instance of the simulation engine with the appropriate configuration and parameters according to the instructions received from the client. for singlerun simulations, the daily results are incrementally served back to the client while the simulation is being executed. this allows for the immediate visualization of the spreading pattern, as described in "visualization interface" subsection. for multi-run simulations the results are statistically analyzed after all runs are finished, and the client has to explicitly request the retrieval of the results once they become available. the gleamviz proxy server component can be configured to keep the simulation data indefinitely or to schedule the cleanup of old simulations after a certain period of time. multi-run metadata is stored in an internal object that is serialized on a system file, ensuring that authorization information is safely kept after a server shutdown or failure. the gleamviz proxy component additionally provides control features such as accepting administrative requests at runtime in order to manage stored simulations or to modify several configuration parameters like the number of simultaneous connections allowed, the number of simultaneous simulations per client, the session timeout, etc. the middleware server is written in python [ ] and uses the twisted matrix library suite [ ] for its networking functionality. client and server communicate using a special purpose protocol, which provides commands for session handling and simulation management. commands and data are binary encoded using adobe action message format (amf ) in order to minimize bandwidth needs. the gleamviz client is a desktop application by which users interact with the gleamviz tool. it provides guis for its four main functions: ) the design of compartmental models that define the infection dynamics; ) the configuration of the simulation parameters; ) the visualization of the simulation results; and ) the management of the user's collection of simulations. in the following section we describe these components in detail. the client was developed using the adobe air platform [ ] and the flex framework [ ] and can thus be deployed on diverse operating systems, including several windows versions, mac os x, and several common linux distributions. the gleamviz client has a built-in updating mechanism to check for the latest updates and developments and prompts the user to automatically download them. it also offers a menu of configuration options of the interface that allows the user to customize preferences about data storage, visualization options, the server connection, and others. the software system presented above is operated through the gleamviz client, which provides the user interface: the part of the tool actually experienced on the user side. the gleamviz client integrates different modules that allow the management of the entire process flow from the definition of the model to the visualization of the results. in the following we will describe the various components and provide the reader with a user study example. the model builder provides a visual modeling tool for designing arbitrary compartmental models, ranging from simple sir models to complex compartmentalization in which multiple interventions can be considered along with disease-associated complications and other effects. (an example can be found in previous work [ ] .) a snapshot of the model builder window is shown in figure . the models are represented as flow diagrams with stylized box shapes that represent compartments and directed edges that represent transitions, which is consistent with standard representations of compartmental models in the literature. through simple operations like 'click and drag' it is possible to create any structure with full flexibility in the design of the compartmentalization; the user is not restricted to a given set of pre-loaded compartments or transition dynamics. the interactive interface provided by the model builder enables the user to define the compartment label, the mobility constraints that apply (e.g. allowed/not allowed to travel by air or by ground), whether the compartment refers to clinical cases, as well as the color and position of their representation in the diagram (see figure ). this allows the user to model many kinds of human-to-human infectious diseases, in particular respiratory and influenza-like diseases. transitions individuals is equal to  si n , where n is the total size of the subpopulation. the gleam simulation engine considers discrete individuals. all its transition processes are both stochastic and discrete, and are modeled through binomial and multinomial processes. transitions can be visually added by dragging a marker from the source to the target compartment. spontaneous transitions are annotated with their rates, which can be modified interactively. infection transitions are accompanied with a representation of the infection's source compartment and the applicable rate (i.e. b in the example above), which can also be modified in an interactive way. the rates can be expressed in terms of a constant value or in terms of a variable whose value needs to be specified in the variables table, as shown in figure . the value can also be expressed by simple algebraic expressions. the client automatically checks for and reports inconsistencies in the model in order to assist the user in the design process (see bottom right window in figure ). models can be exported to xml files and stored locally, allowing the user to load a model later, modify it, and share it with other users. the diagram representation can be exported as a pdf or svg file for use in documentation or publications. a few examples of compartmental models are available for download from the simulator website. the simulation wizard provides a sequence of panels that leads the user through the definition of several configuration parameters that characterize the simulation. figure shows some of these panels. the consecutive steps of the configuration are as follows: •choice of the type of the simulation (panel a) the user is prompted with three options: create a new single-run simulation or a new multi-run simulation from scratch, or a new one based on a saved simulation previously stored in a file. •compartmental model selection and editing the user can design a new compartmental model, modify the current compartmental model (when deriving it from an existing simulation), or load a model compartmentalization from a file. •definition of the simulation parameters (panel c) the user is asked to specify various settings and parameter values for the simulation, including, e.g., the number of runs to perform (only accessible in the case of a multi-run), the initial date of the simulation, the length of the simulation (in terms of days), whether or not seasonality effects should be considered, the airplane occupancy rate, the commuting time, the conditions for the definition of an outbreak, and others. •initial assignment of the simulation (panel d) here the user assigns the initial distribution of the population amongst compartments, defining the immunity profile of the global population on the starting date. •definition of the outbreak start (panel e) this panel allows the user to define the initial conditions of the epidemic by selecting the city (or cities) seeded with the infection. •selection of output results (panel f) here the user selects the compartments that will constitute the output provided by the client at the end of the simulation. the corresponding data will be shown in the visualization window and made available for download. when all the above configuration settings are defined, the user can submit the simulation to the gleamviz server for execution. this will automatically add the simulation to the user's simulations history. it is furthermore possible to save the definition of the simulation setup to a local file, which can be imported again later or shared with other users. the simulations history is the main window of the client and provides an overview of the simulations that the user has designed and/or submitted, in addition to providing access to the model builder, the simulation wizard, and the visualization component. the overview panel shown in figure lists the simulation identifier, the submission date and time, the simulation type (i.e., single or multi-run), the execution status (i.e., initialized, start pending, started, aborted, complete, failed, or stop pending) and the results status (i.e., none, retrieve pending, retrieving, stop retrieve pending, complete, or stored locally). additional file provides a detailed explanation of all these values. a number of context-dependent command buttons are available once a simulation from the list is selected. those buttons allow the user to control the simulation execution, retrieve the results from the server and visualize them, clone and edit the simulation to perform a new execution, save the simulation definition or the output data to the local machine (in order to analyze the obtained data with other tools, for example), and remove the simulation. in addition to exporting the compartmental model (see the "model builder" subsection) the user can export a complete configuration of a simulation that includes the compartmental model and the entire simulation setup to a local file, which can be imported again later or shared with other users. once the execution of a simulation is finished and the results have been retrieved from the server, the client can display the results in the form of an interactive visualization of the geo-temporal evolution of the epidemic. this visualization consists of a temporal and geographic mapping of the results accompanied by a set of graphs (see figure ). the geographic mapping involves a zoomable multi-scale map on which the cells of the population layer are colored according to the number of new cases of the quantity that is being displayed. several visualization features can be customized by clicking on the gear icon and opening the settings widget. it is possible to zoom in and out and pan by means of the interface at the top left of the map. dragging the map with the mouse (on a location where there are no basin marks) can also pan the visualization. all the widgets and the graphs displayed over the map can be re-positioned according to the user's preferences by clicking and dragging the unused space in the title bar. the color coding of the map represents the number of cases on a particular day. the time evolution of the epidemic can be shown as a movie, or in the form of daily states by moving forward or backward by one day at a time. for single-run simulations it is also possible to show the airline transportation of the 'seeding' individuals by drawing the traveling edge between the origin and destination cities. in the case where the output quantity is a subset of the infectious compartments, the edges show the actual seeding of the infection. note that the evolution of the epidemic depends strongly on the model definition. for example, it is possible that some basins are infected by a latent individual that later develops the disease. in this case no seeding flight will be shown if only infectious compartments are selected as output. beside the geographical map, the visualization window displays two charts. one chart shows the number of new cases per , over time (incidence), and the other shows the cumulative number of new cases per , over time (size). for multi-run simulations, median values and corresponding % confidence intervals are shown. the menu above each chart combo lets the user choose the context for which the corresponding charts show incidence and size data. this context is either: global, one of three hemispheres, one continent, one region, one country, or one city. the currently selected day is marked by a vertical line in these plots, and the day number, counted from the initial date selected for the simulation, is shown by side of the time slider. here we present an example application of the gleamviz tool to study a realistic scenario for the mitigation of an emerging influenza pandemic. disease-control programs foresee the use of antiviral drugs for treatment and shortterm prophylaxis until a vaccine becomes available [ ] . the implementation of these interventions rely both on logistical constraints [ , ] -related, e.g., to the availability of drugs -and on the characteristics of the infection, including the severity of the disease and the virus's potential to develop resistance to the drugs [ ] . here we focus on the mitigation effects of systematic antiviral (av) treatment in delaying the activity peak and reducing attack rate [ ] [ ] [ ] , , , , ] , and assume that all countries have access to av stockpiles. we consider a scenario based on the h n influenza pandemic outbreak and feed the simulator with the set of parameters and initial conditions that have been estimated for that outbreak through a maximum likelihood estimate by using the gleam model [ ] . the results provided by the present example are not meant to be compared with those contained in the full analysis carried out with gleam [ ] due to the fact that in the figure the simulation results can be inspected in an interactive visualization of the geo-temporal evolution of the epidemic. the map shows the state of the epidemic on a particular day with infected population cells color-coded according to the number of new cases of the quantity that is being displayed. pop-ups provide more details upon request for each city basin. the zoomable multi-scale map allows the user to get a global overview, or to focus on a part of the world. the media-player-like interface at the bottom is used to select the day of interest, or show the evolution of the epidemic like a movie. two sets of charts on the right show the incidence curve and the cumulative size of the epidemics for selectable areas of interest. present example we do not consider additional mitigation strategies that were put in place during the early phase of the outbreak, such as the sanitary control measures implemented in mexico [ , ] , or the observed reduction in international travel to/from mexico [ ] . indeed, the current version of gleamviz does not allow for interventions that are geographically and/or temporally dependent. however, these features are currently under development and will be available in the next software release. for this reason the simulation scenario that we study in this application of the simulator does not aim to realistically reproduce the timing of the spreading pattern of the h n pandemic. the results reported here ought to be considered as an assessment of the mitigating impact of av treatment alone, based on the initial conditions estimated for the h n outbreak, and assuming the implementation of the same av protocol in all countries of the world. we adopt a seir-like compartmentalization to model influenza-like illnesses [ ] in which we include the systematic successful treatment of % of the symptomatic infectious individuals (see figure ). the efficacy of the figure compartmental structure in each subpopulation in the intervention scenario. a modified susceptible-latent-infectious-recovered model is considered, to take into account asymptomatic infections, traveling behavior while ill, and use of antiviral drugs as a pharmaceutical measure. in particular, infectious individuals are subdivided into: asymptomatic (infectious_a), symptomatic individuals who travel while ill (infectious_s_t), symptomatic individuals who restrict themselves from travel while ill (infectious_s_nt), symptomatic individuals who undergo the antiviral treatment (infectious_avt). a susceptible individual interacting with an infectious person may contract the illness with rate beta and enter the latent compartment where he/she is infected but not yet infectious. the infection rate is rescaled by a factor ra in case of asymptomatic infection [ , ] , and by a factor ravt in case of a treated infection. at the end of the latency period, of average duration equal to eps - , each latent individual becomes infectious, showing symptoms with probability -p a , whereas becoming asymptomatic with probability p a [ , ] . change in travelling behavior after the onset of symptoms is modeled with probability p t set to % that individuals would stop travelling when ill [ ] . infectious individuals recover permanently after an average infectious period mu - equal to . days. we assume the antiviral treatment regimen to be administered to a % fraction (i.e. pavt = . ) of the symptomatic infectious individuals within one day from the onset of symptoms, reducing the infectiousness and shortening the infectious period of day. [ , ] . av treatment is accounted for in the model by a % reduction in the transmissibility of the disease by an infected person under av treatment when av drugs are administered in a timely fashion [ , ] . we assume that the drugs are administered within day of the onset of symptoms and that the av treatment reduces the infectious period by day [ , ] . the scenario with av treatment is compared to the baseline case in which no intervention is considered, i.e. the probability of treatment is set equal to in all countries. the gleamviz simulation results are shown in figure where the incidence profiles in two different regions of the world, north america and western europe, are shown for both the baseline case and the intervention scenario with av treatment. the results refer to the median (solid line) and % reference range (shaded area) obtained from stochastic realizations of each scenario starting from the same initial conditions. the resulting incidence profiles of the baseline case peak at around mid-november and the end of november in the us and western europe, respectively. these results show an anticipated peak of activity for the northern hemisphere with respect to the expected peak time of seasonal influenza. in order to make a more accurate comparison with the surveillance data in these regions, we should rely on the predictions provided by models that can take into account the full spectrum of strategies that were put in place during the h n outbreak, viz. the predictions obtained by gleam [ ] . in the case of a rapid and efficient implementation of the av treatment protocol at the worldwide level, a delay of about weeks would be obtained in the regions under study, a result that could be essential in gaining time to deploy vaccination campaigns targeting high-risk groups and essential services. in addition, the gleamviz tool provides simulated results for the number of av drugs used during the evolution of the outbreak. if we assume treatment delivery and successful administration of the drugs to % of the symptomatic cases per day, the number of av drugs required at the activity peak in western europe would be . courses per , persons, and the size of the stockpile needed after the first year since the start of the pandemic would be about % of the population. again, we assume a homogeneous treatment protocol for all countries in the world; results may vary from country to country depending on the specific evolution of the pandemic at the national level. computer-based simulations provide an additional instrument for emerging infectious-disease preparedness and control, allowing the exploration of diverse scenarios and the evaluation of the impact and efficacy of various intervention strategies. here we have presented a computational tool for the simulation of emerging ili infectious diseases at the global scale based on a datadriven spatial epidemic and mobility model that offers an innovative solution in terms of flexibility, realism, and computational efficiency, and provides access to sophisticated computational models in teaching/training settings and in the use and exploitation of large-scale simulations in public health scenario analysis. project name: gleamviz simulator v . project homepage: http://www.gleamviz.org/simulator/ operating systems (client application): windows (xp, vista, ), mac os x, linux. programming language: c++ (gleamsim core), python (gleamproxy, gleamsim wrapper), action-script (gleamviz) other requirements (client application): adobe air runtime, at least mb of free disk space. license: saas baseline scenario scenario with av figure simulated incidence profiles for north america and western europe in the baseline case (left panels) and in the av treatment scenario (right panels). the plots are extracted from the gleamviz tool visualization. in the upper plots of each pair the curves and shaded areas correspond to the median and % reference range of stochastic runs, respectively. the lower curves show the cumulative size of the infection. the dashed vertical line marks the same date for each scenario, clearly showing the shift in the epidemic spreading due to the av treatment. any restrictions to use by non-academics: none. the server application can be requested by public institutions and research centers; conditions of use and possible restrictions will be evaluated specifically. additional file : the gleamviz computational tool: additional file. this file includes information for installing the gleamviz client and details of the features of its various components. the transmissibility and control of pandemic influenza a (h n ) virus potential for a global dynamic of influenza a (h n ) seasonal transmission potential and activity peaks of the new influenza a(h n ): a monte carlo likelihood analysis based on human mobility modelling disease outbreaks in realistic urban social networks epifast: a fast algorithm for large scale realistic epidemic simulations on distributed memory systems multiscale mobility networks and the spatial spreading of infectious diseases strategies for containing an emerging influenza pandemic in southeast asia mitigation strategies for pandemic influenza in the united states mitigation measures for pandemic influenza in italy: an individual based model considering different scenarios flute, a publicly available stochastic influenza epidemic simulation model the influenza pandemic preparedness planning tool influsim an extensible spatial and temporal epidemiological modelling system centers for disease control and prevention (cdc) modeling the spatial spread of infectious diseases: the global epidemic and mobility computational model centers for disease control and prevention (cdc) controlling pandemic flu: the value of international air travel restrictions a mathematical model for the global spread of influenza assessing the impact of airline travel on the geographic spread of pandemic influenza forecast and control of epidemics in a globalized world delaying the international spread of pandemic influenza modeling the worldwide spread of pandemic influenza: baseline case and containment interventions predictability and epidemic pathways in global outbreaks of infectious diseases: the sars case study socioeconomic data and applications center (sedac). columbia university the architecture of complex weighted networks estimating spatial coupling in epidemiological systems: a mechanistic approach a structured epidemic model incorporating geographic mobility among regions infectious diseases in humans the role of airline transportation network in the prediction and predictability of global epidemics the modeling of global epidemics: stochastic dynamics and predictability modeling vaccination campaigns and the fall/winter activity of the new a (h n ) influenza in the northern hemisphere python programming language twisted matrix networking engine adobe flex framework modeling the critical care demand and antibiotics resources needed during the fall wave of influenza a (h n ) pandemic world health organization: pandemic preparedness antiviral treatment for the control of pandemic influenza: some logistical constraints hedging against antiviral resistance during the next influenza pandemic using small stockpiles of an alternative chemotherapy containing pandemic influenza with antiviral agents containing pandemic influenza at the source potential impact of antiviral drug use during influenza pandemic modelling of the influenza a(h n )v outbreak in mexico city secretaría de comunicaciones y transportes the who rapid pandemic assessment collaboration: pandemic potential of a strain of influenza a(h n ): early findings we are grateful to the international air transport association for making the airline commercial flight database available to us. this work has been partially funded by the nih r -da award, the lilly endowment grant - and the dtra- - award to av; the ec-ict contract no. (epiwork) to av, vc, and wvdb; the ec-fet contract no. (dynanets) to av and vc; the erc ideas contract n.erc- -stg (epifor) to vc, cg, and mq. authors' contributions cg, wvdb and bg designed the software architecture. wvdb and mq developed the client application. bg implemented the gleam engine. cg developed the proxy middleware. cg, vwdb, vc and av drafted the manuscript. mq and bg helped draft the manuscript. av and vc conceived and coordinated the software project, designed and coordinated the application study. all authors read and approved the final manuscript.competing interests av is consulting and has a research agreement with abbott for the modeling of h n diffusion. the other authors have declared that no competing interests exist. key: cord- -cbikq v authors: papadakos, panagiotis; kalipolitis, orfeas title: dualism in topical relevance date: - - journal: advances in information retrieval doi: . / - - - - _ sha: doc_id: cord_uid: cbikq v there are several concepts whose interpretation and meaning is defined through their binary opposition with other opposite concepts. to this end, in this paper we elaborate on the idea of leveraging the available antonyms of the original query terms for eventually producing an answer which provides a better overview of the related conceptual and information space. specifically, we sketch a method in which antonyms are used for producing dual queries, which can in turn be exploited for defining a multi-dimensional topical relevance based on the antonyms. we motivate this direction by providing examples and by conducting a preliminary evaluation that shows its importance to specific users. dualism denotes the state of two parts. the term was originally coined to denote co-eternal binary opposition and has been especially studied in philosophy. for example, there is duality in ethics (good -bad), in human beings (man -nietzsche'sübermensch or man -god) and in logic (true -false). in addition, dualism determines in a great extent our everyday lives (ugly -beautiful, happyunhappy, etc.), and our relations with other people (rich -poor, black -white, love -hate, etc.). none of these concepts can be understood without their dual concepts, since this duality and opposition generates their meaning and interpretation. dualism is also crucial in mathematics and physics (e.g., matterantimatter), and is the power behind our whole information society and our binary data. moving from philosophy, sciences and everyday life to information retrieval, we find a very vague situation. users of search engines are 'dictated' to provide a very concise and specific query that is extremely efficient for focalized search (e.g., looking for a specific hotel). on the other hand, studies show that % of user tasks are of exploratory nature [ ] . in such tasks users do not accurately know their information need and can not be satisfied by a single 'hit' [ ] . consequently, users spend a lot of time reformulating queries and investigating results, in order to construct a conceptual model regarding their information need. information needs that include non-monosemous terms can be considered such exploratory tasks. however, the simplicity of inserting terms in an empty text box and 'magically' return the most relevant object(s), will always be a desired feature. in this paper we elaborate on the idea of leveraging the available antonyms of the original query terms (if they exist), for eventually producing an answer which provides a better overview of the related information and conceptual space. we sketch a method in which antonyms are used for producing dual queries, which in turn can be exploited for defining a multi-dimensional topical relevance. this approach can be applied on demand, helping users to be aware of the various opposing dimensions and aspects of their topic of interest. a preliminary evaluation shows the value of the approach for some exploratory tasks and users. to the best of our knowledge, the proposed direction is not covered by the existing literature. antonyms have been studied in fuzzy logic [ ] showing a relation with negates. in the ir domain, query expansion methods are based on synonyms and semantically related terms, but do not exploit antonyms explicitly, while in relevance and pseudo-relevance feedback techniques the antonyms are essentially penalized [ ] . results diversification can produce a kind of dual clusters, but this is neither guaranteed nor controlled [ ] . "capitalism and war". consider a user exploring the relationship between capitalism and war. the user submits to a wse (web search engine) the query "capitalism and war" and starts inspecting the results. the left part of fig. shows the top- results for this query from a popular wse. the results include articles about the connection of capitalism with war from research and academic domains, as well as from socialistic, communistic and theological sites. considering a different direction, the user might also be interested about how capitalism can support peace, the dual of war. the top- results for the query "capitalism and peace" are shown at the right side of fig. . they contain a wikipedia and a research article about the capitalist peace theory, and articles about the importance of capitalism for the prosperity of modern societies and its association to peace from policy research organizations. analogously, since socialism is the economic system that opposes capitalism, the user could be interested about how socialism may promote war or support peace, by inspecting the results of the queries "socialism and war" and "socialism and peace" respectively. the top- results for each of the above queries are shown in fig. . the results for the former query include the socialism and war pamphlet written by lenin, a collection of articles by the economist and philosopher friedrich hayek, a list of articles from two marxist domains, and a critical article for both left and right views from the foundation for economic education. for the latter query, the results include articles connecting socialism with peace, like a chapter from the encyclopedia of anti-revisionism, a wikipedia article about the theoretical magazine problems of peace and socialism, and an article from a site supporting a far left u.s. party. the above hits indicate interesting directions to the original information need of the user. we argue that users should get aware of these directions for a better exploration of the domain at hand, since they can provide a more comprehensive view of the information and conceptual space. furthermore, the exploration of these directions let available supportive or counter arguments of dual concepts to emerge, leading to better informed and responsible humans and citizens. "aloe". a comprehensive view of the various different directions can be beneficial also for reducing false-positive results. for example, consider a pregnant woman that was advised to take aloe vera by mouth to relieve digestive discomfort. to check if this is true, she submits to a wse the query "aloe vera indications". however, since aloe can stimulate uterine contractions, increasing the risk of miscarriage or premature birth, it is crucial to know also its contraindications. the proposed direction can alleviate this problem, because this information would be contained in the results of the query "aloe vera contraindications". one can imagine various ways for leveraging antonyms. we shall hereafter use t t to denote that the terms t and t are antonyms. building on the "capitalistic" example of the previous section, according to the online dictionary wordnet , socialism capitalism, and war peace. now, we can generate all possible queries, denoted by q, where non-monosemous terms of the original query are substituted by their dual ones, as expressed by their antonyms. for example, the query "capitalism and war" will generate three extra queries: "socialism and peace", "capitalism and peace" and "socialism and war". based on q we can now define two vector spaces. in the first case, the space has |q| dimensions, where each query is a dimension of the space. each document is placed in this space according to its relevenace to each query. in the second case we assume a space with only |q| dimensions. each dimension represents a pair of dual queries, where each query in the pair contains the antonyms of the other. we denote with q q , that the queries q and q are dual. for our running example, the first pair is ("capitalism and war","socialism and peace") and the second one is ("capitalism and peace","socialism and war"). each pair defines an axis, therefore the two pairs define a d space against which we can evaluate the "value" of each document. for each axis we can consider policies for composing the relevance scores of each document to each member of a dual query. generally, there are various criteria that can be considered for assessing the value of each document or set of documents. such criteria include the bias of documents to specific queries (e.g., the original user query), the purity to a specific query, the overview factor of a document regarding either a dual query or all queries, and the diversity of the returned set of documents with respect to these queries. in general, we need to define appropriate ranking methods, that will take into account the relevance of the documents to the available queries for different criteria. therefore, we will explore whether the existing multiplecriteria approaches described in [ , , , ] are appropriate for the problem at hand. regarding the process of finding the corresponding antonyms, we can use existing dictionaries like wordnet for nouns and adjectives or word-embedding antonym detection approaches like [ , ] . the case of verbs and adverbs is more complicated since they require a kind of grammatical and language analysis (i.e., exist not exist, lot total, a lot bit, etc). there are three categories of antonyms: (a) gradable, (b) relational and (c) complementary. we have gradable antonyms (e.g., hot cold) in cases where the definitions of the words lie on a continuous spectrum. we have relational antonyms (e.g., teacher student) in cases where the two meanings are opposite only within the context of their relationship. the rest are called complementary antonyms (e.g., day night). in general, the selection of the "right" antonyms raises various questions. in many cases more than one antonyms exist, so one should decide which one(s) to select. sometimes this can depend on the context, e.g., the antonym of "action" is "apathy", but in terms of physics or sociology the dual of "action" is "reaction". notice that the proposed approach can be exploited in any context where the aim is to retrieve semantically opposing entities, information, etc. as an example consider the argument web [ ] , where the approach could be used for retrieving contradicting arguments and providing support for each one of them. from a system's perspective, the approach can be realized in various levels and settings. in the setting of an ir system, it can be implemented by changing accordingly the query processor and the ranking module, while in a meta-search setting, by changing the query rewriting, the query forwarding and the ranking components. it could also be exploited in the query autocompletion layer. to start with, we have conducted a preliminary evaluation. we have specified information tasks which are shown in table , that can exploit the proposed approach. the tasks are of exploratory nature and were created using the task refinement steps described in [ ] . we have identified the following types of tasks: explore domain (ed), medical treatment (mt), explore product reviews (epr) and person qualities (pq). for each task we provide a description of the information need, a representative query and the relevant antonyms, which were manually selected from the list of the respective wordnet antonyms. we conducted our experiment over female and male users of various ages. for each task, they were given two lists of results. one contained the results of the query from a popular wse, and the other one was constructed by interleaving the results of the same wse for the dual queries of this task (i.e., first the top result of the original query, then the first result of its dual, etc.). the two kinds of lists were given in random order for each task. the users were asked to select the most preferred list and to provide a grade of preference taking values in { , , , , }, where means that the selected list was preferred much more than the other one. in the background, when users prefer the results of the dual approach, we change the sign of the score and make it negative. the users were not aware how the lists were constructed and were not guided in any way by the evaluator. in fig. we provide two graphs that describe the results of the evaluation. figure (a), shows the aggregated scores given by all users to each query, while fig. (b) shows the aggregated scores given by each participant to all queries. regarding the first one the results are not the expected ones, although we hypothesize that the users mainly penalized the dual approach because of the 'irrelevant' results to the original query in terms of query tokens and not in terms of relevant information. for eleven of the queries there is a strong preference towards the non-dual approach. the epr type of queries belong to this category, showing that users are probably not interested for reviews with the opposite direction of what they are looking for. this is especially true for q , where the dual approach provided results about winter vacations and was the least preferred. for two of the tasks, the approaches are almost incomparable. both of these tasks belong to the mt group. there are also two queries, q and q , where the dual approach is better, especially in the last one. in their comments for these queries, users mention that the selected (i.e., dual) list "provides a more general picture" and "more relevant and interesting results, although contradicting". regarding the second graph we have the interesting result that the proposed approach appeals to specific users. it seems that nine users ( % of the participants) have an exploratory nature and generally prefer the dual approach (six of them strongly), while for four of them the two approaches are incomparable. the rest are better served with the non-dual approach. this is an interesting outcome, and in the future we plan to identify the types of users that prefer the dual approach. we have motivated with examples why it is worth investigating dualism for nonmonosemous terms in the context of exploratory search and we have shown its importance at least for some types of users and tasks. for the future, we plan to define the appropriate antonyms selection algorithms and relevance metrics, implement the proposed functionality in a meta-search setting, and conduct a large scale evaluation with real users over exploratory tasks, to identify in which queries the dual approach is beneficial and to what types of users. query expansion techniques for information retrieval: a survey implementing the argument web evaluating subtopic retrieval methods: clustering versus diversification of search results multidimensional relevance: a new aggregation criterion supporting exploratory search multidimensional relevance: prioritized aggregation in a personalized information retrieval setting on antonym and negate in fuzzy logic improving word embeddings for antonym detection using thesauri and sentiwordnet negotiating a multidimensional framework for relevance space creating exploratory tasks for a faceted search interface word embedding-based antonym detection using thesauri and distributional information understanding user goals in web search relevance: a review of the literature and a framework for thinking on the notion in information science. part ii: nature and manifestations of relevance key: cord- -ldfgi vr authors: wen, jie; wei, lingwei; zhou, wei; han, jizhong; guo, tao title: gcn-ia: user profile based on graph convolutional network with implicit association labels date: - - journal: computational science - iccs doi: . / - - - - _ sha: doc_id: cord_uid: ldfgi vr inferring multi-label user profile plays a significant role in providing individual recommendations and exact-marketing, etc. current researches on multi-label user profile either ignore the implicit associations among labels or do not consider the user and label semantic information in the social networks. therefore, the user profile inferred always does not take full advantage of the global information sufficiently. to solve above problem, a new insight is presented to introduce implicit association labels as the prior knowledge enhancement and jointly embed the user and label semantic information. in this paper, a graph convolutional network with implicit associations (gcn-ia) method is proposed to obtain user profile. specifically, a probability matrix is first designed to capture the implicit associations among labels for user representation. then, we learn user embedding and label embedding jointly based on user-generated texts, relationships and label information. on four real-world datasets in weibo, experimental results demonstrate that gcn-ia produces a significant improvement compared with some state-of-the-art methods. with the growing popularity of online social networks including weibo and twitter, the "information overload" [ ] come up and the social media platforms took more effort to satisfy users' more individualized demands by providing personalized services such as recommendation systems. user profile, the actual representation to capture certain characteristics about an individual user [ ] , is the basis of recommendation system [ ] and exact-marketing [ , ] . as a result, user profiling methods, which help obtaining accurate and effective user profiles, have drawn more and more attention from industrial and academic community. a straightforward way of inferring user profiles is leveraging information from the user's activities, which requires the users to be active. however, in many real-world applications a significant portion of users are passive ones who keep following and reading but do not generate any content. as a result, label propagation user profile methods [ ] [ ] [ ] are widely studied, which mainly use the social network information rather than user's activities. in order to obtain user profile more accurately and abundantly, multi-label is applied in many researches to describe users' attributes or interests. different labels were assumed independently [ ] in some research, while the associations among labels were ignored and some implicit label features remained hidden. meanwhile, several researches [ , , ] considered the explicit associations among labels to get user profile and achieved better performance. besides the explicit associations, there exists implicit association among labels that is beneficial to make user profile more accurate and comprehensive. the previous work [ ] leveraged internal connection of labels, which is called implicit association. however, this work only considered the relation of labels, but ignored the user and label semantic information jointly based on user-generated texts, relationships and label information, which is also important for user profile. to take advantage of this insight, a graph convolutional networks with implicit label associations (gcn-ia) is proposed to get user profile. a probability matrix is first designed to capture the implicit associations among labels for user representation. then, we learn user embedding and label embedding jointly based on user-generated texts, relationships and label information. finally, we make multi-label classification based on given user representations to predict unlabeled user profiles. the main contributions of this paper are summarized as follows: -insight. we present a novel insight about combination among implicit association labels, user semantic information and label semantic information. in online social networks, due to users' personalized social and living habits, there are still certain implicit associations among labels. at the same time, user and label information from user-generated texts, relationships and label information is significant for the construction of user profile. -method. a graph convolutional networks with implicit label associations (gcn-ia) method is proposed to get user profile. we first construct the social network graph with the relationship between users and design a probability matrix to record the implicit label associations, and then combine this probability matrix with the classical gcn method to embed user and label semantic information. -evaluation. experiments evaluating gcn-ia method on real weibo data sets of different sizes are conducted. the comparative experiments evaluate the accuracy and effectiveness of gcn-ia. the results demonstrate that the performance is significantly improved compared with some previous methods. the following chapters are organized as follows: in sect. , related works are briefly elaborated. the sect. describes the details of gcn-ia, and experiments and results are described in sect. . finally, we summarize the conclusion and future work in sect. . label propagation method shows advantages of linear complexity and less required given user's labels, and disadvantages such as low accuracy and propagation instability. the existing label propagation methods in user profile can be divided into three parts. one is to optimize the label propagating process to obtain more stable and accurate profiles, the second part is to propagate multi-label through social network structure to get more comprehensive user profile, and the last part is to apply deep-learning methods such as gcn to infer multi-label user profile. label propagation method was optimized by leveraging more user attributes information, specifying propagation direction and improving propagation algorithm. subelj et al. proposed balanced propagation algorithm in which an increasing propagation preferences could decide the update order certain nodes, so that the randomness was counteracted by utilizing node balancers [ ] . ren et al. introduced node importance measurement based on the degree and clustering coefficient information to guide the propagation direction [ ] . li et al. leveraged user attributes information and user attributes' similarity to increase recall ratio of user profile [ ] . huang et al. redefined the label propagating process with a multi-source integration framework that considered content and network information jointly [ ] . explicit associations among labels also have been taken into consideration in some research, glenn et al. [ ] introduced the explicit association labels and the results proved the efficiency of the method. we innovatively introduced the implicit association labels into multi-label propagation [ ] , the method was proved to be convergent and faster than traditional label propagation algorithm and its performance was significantly better than the state-of-the-art method on weibo datasets. however the research [ ] ignored user embedding and label embedding jointly based on user-generated texts, relationships and label information, which seemed very important for user profile. the multi-label algorithms were widely applied to get abundant profile. gregory et al. proposed copra algorithm and extended the label and propagation step to more than one community, which means each node could get up to v labels [ ] . zhang et al. used the social relationship to mine user interests, and discovered potential interests from his approach [ ] . xie et al. recorded all the historical labels from the multi-label propagation process, which make the profile result more stable [ ] . wu et al. proposed balanced multi-label propagation by introducing a balanced belonging coefficients p, this method improved the quality and stability of user profile results on the top of copra [ ] . label propagation algorithm has been improved in different aspects in the above work, however it's still difficult to get a high accuracy and comprehensive profile due to the lack of input information and the complex community structures. gcn [ ] is one of the most popular deep learning methods, which can be simply understood as a feature extractor for graphs. by learning graph structure features through convolutional neural network, gcn is widely used in node classification, graph classification, edge prediction and other research fields. gcn is a semi-supervised learning method, which can infer the classification of unknown nodes by extracting the characteristics of a small number of known nodes and the graph structure. due to the high similarity with the idea of label propagation, we naturally consider constructing multilabel user profile with gcn. wu et al. proposed a social recommendation model based on gcn [ ] , in which both user embedding and item embedding were learned to study how users' interests are affected by the diffusion process of social networks. william et al. [ ] and yao et al. [ ] applied gcn for text classification and recommendation systems respectively, with node label and graph structure considered to gcn modeling. however, the existing methods rarely consider the implicit relationships between labels in the gcn based methods. this section mainly focuses on the improvement of graph convolutional networks (gcn) based on implicit association labels. the goal of this paper is to learn user representation for multi-label user profile task by modeling user-generated text and user relationships. the overall architecture of gcn-ia is shown in fig. . the model consists of three components: prior knowledge enhancement (pke) module, user representation module, and classification module. similar with other graph-based method, we formulated the social network into a heterogeneous graph. in this graph, nodes represent the users in social network and edges represent user's multiple relationships such as following, supporting and forwarding. first, pke captures the implicit associations among labels for user representation. then, user representation module learns user embedding and label embedding jointly based on user-generated texts, relationships and label information. classification module makes multi-label classification based on user representations to predict unlabeled user profiles. social networks are full of rich knowledge. according to [ ] , associations among implicit labels are very significant in user profile. in this part, we introduce the knowledge of implicit association among labels to capture the connections among users and their profile labels. a priori knowledge probability matrix p is defined as eq. ( ). probability of propagation among labels gets when higher p ij gets a higher value. associations in social network are complex due to uncertainty [ ] or special events [ ] . therefore, we define the set of labels, where elements are sampled by cooccurrence, cultural associations, event associations or custom associations, as shown in eq. ( ) . where i i (i = , , , . . .) represents respectively a set of each user's interest label set. generally, the key idea of gcns is to learn the iterative convolutional operation in graphs, where each convolutional operation means generating the current node representations from the aggregation of local neighbors in the previous layer. a gcn is a multilayer neural network that operates directly on a graph and induces embedding vectors of nodes based on properties of their neighborhoods. in the user representation module, we apply gcns to embed users and profile labels into a vector space and learn user representation and label representation jointly from user-generated content information and social relationships. specifically, the implicit associations as prior knowledge are introduced to improve the gcns to model the associations among labels. formally, the model considers a social network g = (v , e) , where v and e are sets of nodes and edges, respectively. in our model, there are two types of nodes, user node and label node. the initialized embedding of user nodes and label nodes, denoted as x, is initialized with user name and their content via pre-trained word vec model. we build edges among nodes based on user relationships (user-user edges), users' profiles (user-label edges) and implicit associations among labels (label-label edges). we introduce an adjacency matrix a of g. and its degree matrix d, where d ii = j= ,...,n a ij . the diagonal elements of a are set to because of self-loops. the weight of the edges between a user node and a label node is based on user profile information, formulated as eq. ( ). where u is the set of all users in the social network, u gold denotes labeled users. and c is the set of labels of user profile. to utilize label co-occurrence information for knowledge enhancement, we calculate weights between two label nodes as described in sect. . . the weights between two user nodes are defined as eq. ( ) according to user relationships. u ), (u , u ) , ..)} is the set of relations between users and sim(i, j) indicates the similarity between user i and user j followed by [ ] . the less the ratio of the value is, the closer the distance is. gcn stacks multiple convolutional operations to simulate the message passing of graphs. therefore, both the information propagation process with graph structure and node attributes are well leveraged in gcns. for a one-layer gcn, the new k-dimensional node feature matrix is computed as: is a normalized symmetric adjacency matric, and w is a weight matrix. σ (·) is an activation function, e.g. a relu function σ (x) = max( , x). and the information propagation process is computed as eq. ( ) by stacking multiple gcn layers. where j denotes the layer number and l ( ) = x . the prediction of user profile is regarded as a multi-classification problem. after the above procedures, we obtain user representation according to user-generated content and relationships. the node embedding for user representation is fed into a softmax classifier to project the final representation into the target space of class probability: finally, the loss function is defined as the cross-entropy error over all labeled users as shown in eq. ( ) . where y u is the set of user's indices with labels, and f is the dimension of the output features which is equal to the number of classes. y is the label indicator matrix. the weight parameters w and w can be trained via gradient descent. weibo is the largest social network platform in china . followed by [ ] , we evaluate our method in different scale data sets in weibo. the datasets are sampled with different users in different time. and we select five classes as interest profiles of users, health, women, entertainment, tourism, society. the details of the datasets are illustrated in table . to evaluate the performance of our method (gcn-ia), we compare it with some existing methods including textual feature-based method and relation feature-based method. in addition, to evaluate the implicit association labels for gcn, we compare gcn-ia with classical gcn. the details of these baselines are listed as follows: svm [ ] uses the method of support vector machine to construct user profile based on user-generated context. in our experiment, we select username and blogs of users to construct user representation based on textual features. the textual features are obtained via pre-trained word vec model. [ ] uses multi-label propagation method to predict user profiles. they capture relationship information by constructing probability transfer matrix. the labeled users are collected if the user is marked with a "v" which means his identity had been verified by weibo. analyzed by jing et al. [ ] , these users were very critical in the propagation. in the experiments, we will analyze the precision ratio (p) and recall ratio (r) of method which respectively represent the accuracy and comprehensiveness of user profile. and f -measure (f ) is a harmonic average of precision ratio and recall ratio, and it reviews the performance of the method. the experiment results are shown in table . the results show that our method can make a significant increase in macro-f in all datasets. compared with feature-based method, our model makes a significant improvement. svm fails since the method does not consider user relationships in the social networks. it only models the user-generated context, such as username and user's blogs. compared with relation-based method, our model achieves improvements in all datasets, especially in dataset of #, we have improved . % in macro-f . mlp-ia [ ] established user profiles based on user's relationships via label propagation. it suffers from leveraging the user-generated context, which contains semantic contextual features. our model can represent users based on both relationships and context information via gcn module, which is more beneficial for identifying multi-label user profile task. the results of each interest class in # dataset are shown in fig. . the results show that gcn-ia performs stably in all interest profiles, which demonstrate the good robustness of our model. as shown in the results, the performance is little weak for the entertainment interest class compared with baselines. in weibo, there are much blogs with aspect to entertainment. fake information exists in social network including fake reviews and fake accounts for specific purposes, which brings huge challenge for user profiles. our model constructs user profile via both textual features and relational features. the results can demonstrate that the user relationships can provide a beneficial signal for semantic feature extraction and the two features can reinforce each other. in this paper, we have studied the user profile by graph convolutional networks with implicit association labels, user information and label information embedding. we proposed a method to utilize implicit association among labels and then we take graph convolutional networks to embed the label and user information. on four real-world datasets in weibo, experimental results demonstrate that gcn-ia produces a significant improvement compared with some state-of-the-art methods. future work will pay more attention to consider more prior knowledge to get higher performance. enriching topic modelling with users' histories for improving tag prediction in q and a systems rectwitter: a semantic-based recommender system for twitter users recommendation of activity sequences during distributed events towards social recommendation system based on the data from microblogs user profiling in an ego network: co-profiling attributes and relationships application of tag propagation algorithm in the interest map of weibo users artwork personalization at netflix dynamic embeddings for user profiling in twitter rectwitter: a semantic-based recommender system for twitter users mlp-ia: multi-label user profile based on implicit association labels metric learning from probabilistic labels learning on probabilistic labels privacy-preserving class ratio estimation robust network community detection using balanced propagation node importance measurement based on the degree and clustering coefficient information a multi-source integration framework for user occupation inference in social media systems finding overlapping communities in networks by label propagation overlapping community detection in networks: the state-of-the-art and comparative study balanced multi-label propagation for overlapping community detection in social networks learning convolutional neural networks for graphs socialgcn: an efficient graph convolutional network based model for social recommendation graph convolutional neural networks for web-scale recommender systems graph convolutional networks for text classification the characterization and composition analysis of weibo inferring user interests in microblogging social networks: a survey svm based automatic user profile construction for personalized search semi-supervised classification with graph convolutional networks. iclr (poster) key: cord- - twmcitu authors: mukhina, ksenia; visheratin, alexander; nasonov, denis title: spatiotemporal filtering pipeline for efficient social networks data processing algorithms date: - - journal: computational science - iccs doi: . / - - - - _ sha: doc_id: cord_uid: twmcitu one of the areas that gathers momentum is the investigation of location-based social networks (lbsns) because the understanding of citizens’ behavior on various scales can help to improve quality of living, enhance urban management, and advance the development of smart cities. but it is widely known that the performance of algorithms for data mining and analysis heavily relies on the quality of input data. the main aim of this paper is helping lbsn researchers to perform a preliminary step of data preprocessing and thus increase the efficiency of their algorithms. to do that we propose a spatiotemporal data processing pipeline that is general enough to fit most of the problems related to working with lbsns. the proposed pipeline includes four main stages: an identification of suspicious profiles, a background extraction, a spatial context extraction, and a fake transitions detection. efficiency of the pipeline is demonstrated on three practical applications using different lbsn: touristic itinerary generation using facebook locations, sentiment analysis of an area with the help of twitter and vk.com, and multiscale events detection from instagram posts. in today's world, the idea of studying cities and society through location-based social networks (lbsns) became a standard for everyone who wants to get insights about people's behavior in a particular area in social, cultural, or political context [ ] . nevertheless, there are several issues concerning data from lbsns in research. firstly, social networks can use both explicit (i.e., coordinates) or implicit (i.e., place names or toponyms) geographic references [ ] ; it is a common practice to allow manual location selection and changing user's position. the twitter application relies on gps tracking, but user can correct the position using the list of nearby locations, which causes potential errors from both gps and user sides [ ] . another popular source of geo-tagged data -foursquare -also relies on a combination of the gps and manual locations selection and has the same problems as twitter. instagram provides a list of closely located points-of-interest [ ] , however, it is assumed that a person will type the title of the site manually and the system will advise the list of locations with a similar name. although this functionality gives flexibility to users, there is a high chance that a person mistypes a title of the place or selects the wrong one. in facebook, pages for places are created by the users [ ] , so all data including title of the place, address and coordinates may be inaccurate. in addition to that, a user can put false data on purpose. the problem of detecting fake and compromised accounts became a big issue in the last five years [ , ] . spammers misrepresent the real level of interest to a specific subject or degree of activity in some place to promote their services. meanwhile, fake users spread unreliable or false information to influence people's opinion [ ] . if we look into any popular lbsn, like instagram or twitter, location data contains a lot of errors [ ] . thus, all studies based on social networks as a data source face two significant issues: wrong location information stored in the service (wrong coordinates, incorrect titles, duplicates, etc.) and false information provided by users (to hide an actual position or to promote their content). thus, in this paper, we propose a set of methods for data processing designed to obtain a clean dataset representing the data from real users. we performed experimental evaluations to demonstrate how the filtering pipeline can improve the results generated by data processing algorithms. with more and more data available every minute and with a rise of methods and models based on extensive data processing [ , ] , it was shown that the users' activity strongly correlates with human activities in the real world [ ] . for solving problems related to lbsn analysis, it is becoming vital to reduce the noise in input data and preserve relevant features at the same time [ ] . thus, there is no doubt that such problem gathers more and more attention in the big data era. on the one side, data provided by social media is more abundant that standard georeferenced data since it contains several attributes (i.e., rating, comments, hashtags, popularity ranking, etc.) related to specific coordinates [ ] . on the other side, the information provided by users of social networks can be false and even users may be fakes or bots. in , goodchild in [ ] raised questions concerning the quality of geospatial data: despite that a hierarchical manual verification is the most reliable data verification method, it was stated that automatic methods could efficiently identify not only false but questionable data. in paper [ ] , the method for pre-processing was presented, and only % of initial dataset was kept after filtering and cleaning process. one of the reasons for the emergence of fake geotags is a location spoofing. in [ ] , authors used the spatiotemporal cone to detect location spoofing on twitter. it was shown that in the new york city, the majority of fake geotags are located in the downtown manhattan, i.e., users tend to use popular places or locations in the city center as spoofing locations. the framework for the location spoofing detection was presented in [ ] . latent dirichlet allocation was used for the topic extraction. it was shown that message similarity for different users decreases with a distance increase. next, the history of user check-ins is used for the probability of visit calculation using bayes model. the problem of fake users and bots identification become highly important in the last years since some bots are designed to distort the reality and even to manipulate society [ ] . thus, for scientific studies, it is essential to exclude such profiles from the datasets. in [ ] , authors observed tweets with specific hashtags to identify patterns of spammers' posts. it was shown that in terms of the age of an account, retweets, replies, or follower-to-friend ratio there is no significant difference between legitimate and spammer accounts. however, the combination of different features of the user profile and the content allowed to achieve a performance of . auc [ ] . it was also shown that the part of bots among active accounts varies between % and %. this work was later improved by including new features such as time zones and device metadata [ ] . in contrast, other social networks do not actively share this information through a public api. in [ ] , available data from social network sites were studied, and results showed that social networks usually provide information about likes, reposts, and contacts, and keep the data about deleted friends, dislikes, etc., private. thus, advanced models with a high-level features are applicable only for twitter and cannot be used for social networks in general. more general methods for compromised accounts identification on facebook and twitter were presented in [ ] . the friends ratio, url ratio, message similarity, friend number, and other factors were used to identify spam accounts. some of these features were successfully used in later works. for example, in [ ] , seven features were selected to identify a regular user from a suspicious twitter account: mandatory -time, message source, language, and proximityand optional -topics, links in the text, and user interactions. the model achieved a high value of precision with approximately % of false positives. in [ ] , random forest classifier was used for spammers identification on twitter, which results in the accuracy of . %. this study was focused on five types of spam accounts: sole spammers, pornographic spammers, promotional spammers, fake profiles, and compromised accounts. nevertheless, these methods are usercentered, which means it is required to obtain full profile information for further analysis. however, there is a common situation where a full user profile is not available for researches, for example, in spatial analysis tasks. for instance, in [ ] , authors studied the differences between public streaming api of twitter and proprietary service twitter firehose. even though public api was limited to % sample of data, it provided % of geotagged data, but only % of all sample contains spatial information. in contrast, instagram users are on average times more likely post data with geotag comparing to twitter users [ ] . thus, lbsn data processing requires separate and more sophisticated methods that would be capable of identifying fake accounts considering incomplete data. in addition to that, modern methods do not consider cases when a regular user tags a false location for some reason, but it should be taken into account as well. as it was discussed above, it is critical to use as clean data as possible for research. however, different tasks require different aspects of data to be taken into consideration. in this work, we focus on the main features of the lbsn data: space, time, and messages content. first of all, any lbsn contains data with geotags and timestamps, so the proposed data processing methods are applicable for any lbsn. secondly, the logic and level of complexity of data cleaning depend on the study goals. for example, if some research is dedicated to studying daily activity patterns in a city, it is essential to exclude all data with wrong coordinates or timestamps. in contrast, if someone is interested in exploring the emotional representation of a specific place in social media, the exact timestamp might be irrelevant. in fig. , elements of a pipeline are presented along with the output data from each stage. as stated in the scheme, we start from general methods for a large scale analysis, which require fewer computations and can be applied on the city scale or higher. step by step, we eliminate accounts, places, and tags, which may mislead scientists and distort results. suspicious profiles identification. first, we identify suspicious accounts. the possibility of direct contact with potential customers attracts not only global brands or local business but spammers, which try to behave like real persons and advertise their products at the same time. since their goal differs from real people, their geotags often differ from the actual location, and they use tags or specific words for advertising of some service or product. thus, it is important to exclude such accounts from further analysis. the main idea behind this method is to group users with the same spatial activity patterns. for the business profiles such as a store, gym, etc. one location will be prevalent among the others. meanwhile, for real people, there will be some distribution in space. however, it is a common situation when people tag the city only but not a particular place, and depending on the city, coordinates of the post might be placed far from user's real location, and data will be lost among the others. thus, on the first stage, we exclude profiles, who do not use geotags correctly, from the dataset. we select users with more than ten posts with location to ensure that a person actively uses geotag functionality and commutes across the city. users with less than ten posts do not provide enough data to correctly group profiles. in addition, they do not contribute sufficiently to the data [ ] . then, we calculate all distances between two consecutive locations for each user and group them by m, i.e., we count all distances that are less than km, all distances between and km and so on. distances larger than km are united into one group. after that, we cluster users according to their spatial distribution. the cluster with a deficient level of spatial variations and with the vast majority of posts being in a single location represents business profiles and posts from these profiles can be excluded from the dataset. at the next step, we use a random forest (rf) classifier to identify bots, business profiles, and compromised accounts -profiles, which do not represent real people and behave differently from them. it has been proven by many studies that a rf approach is efficient for bots and spam detection [ , ] . since we want to keep our methods as general as possible and to keep our pipeline applicable to any social media, we consider only text message, timestamp, and location as feature sources for our model. we use all data that a particular user has posted in the studied area and extract the following spatial and temporal features: number of unique locations marked by a user, number of unique dates when a user has posted something, time difference in seconds between consecutive posts. for time difference and number of posts per date, we calculated the maximum, minimum, mean, and standard deviation. from text caption we have decided to include maximum, minimum, average, mean, standard deviation of following metrics: number of emojis per post, number of hashtags per post, number of words per post, number of digits used in post, number of urls per post, number of mail addresses per post, number of user mentions per post. in addition to that, we extracted money references, addresses, and phone numbers and included their maximum, minimum, average, mean, and standard deviation into the model. in addition, we added fraction of favourite tag in all user posts. thus, we got features in our model. as a result of this step, we obtain a list of accounts, which do not represent normal users. city background extraction. the next stage is dedicated to the extraction of basic city information such as a list of typical tags for the whole city area and a set of general locations. general locations are places that represent large geographic areas and not specific places. for example, in the web version of twitter user can only share the name of the city instead of particular coordinates. some social media like instagram or foursquare are based on a list of locations instead of exact coordinates, and some titles in this list represent generic places such as streets or cities. data from these places is useful in case of studying the whole area, but if someone is interested in studying actual temporal dynamics or spatial features, such data will distort the result. also, it should be noted that even though throughout this paper we use the word 'city' to reference the particular geographic area, all stages are applicable on the different scales starting from city districts and metropolitan regions to states, countries, or continents. firstly, we extract names of administrative areas from open street maps (osm). after that, we calculate the difference between titles in social media data and data from osm with the help of damerau-levenshtein distance. we consider a place to be general if the distance between its title and some item from the list of administrative objects is less than . these locations are excluded from the further analysis. for smaller scales such as streets or parks, there are no general locations. then, we analyze the distribution of tags mentions in the whole area. the term 'tag' denotes the important word in the text, which characterizes the whole message. usually, in lbsn, tags are represented as hashtags. however, they can also be named entities, topics, or terms. in this work, we use hashtags as an example of tags, but this concept can be further extrapolated on tags of different types. the most popular hashtags are usually related to general location (e.g., #nyc, #moscow) or a popular type of content (#photo, #picsoftheday, #selfie) or action (#travel, #shopping, etc.). however, these tags cannot be used to study separate places and they are not relevant either to places or to events since they are actively used in the whole area. nevertheless, scientists interested in studying human behavior in general can use this set of popular tags because it represents the most common patterns in the content. in this work, we consider tag as general if it was used in more than % of locations. however, it is possible to exclude tags related to public holidays. we want to avoid such situations and keep tags, which have a large spatial distribution but narrow peak in terms of temporal distribution. thus, we group all posts that mentioned a specific tag for the calendar year and compute their daily statistics. we then use the gini index g to identify tags, which do not demonstrate constant behavior throughout the year. if g ≥ . we consider tag as an event marker because it means that posts distribution have some peaks throughout the year. this pattern is common for national holidays or seasonal events such as sports games, etc. thus, after the second stage, we obtain the dataset for further processing along with a list of common tags and general locations for the studying area. spatial context extraction. using hashtags for events identification is a powerful strategy, however, there are situations where it might fail. the main problem is that people often use hashtags to indicate their location, type of activity, objects on photos and etc. thus, it is important to exclude hashtags which are not related to the possible event. to do that, we grouped all hashtags by locations, thus we learn which tags are widely used throughout the city and which are place related. if some tag is highly popular in one place, it is highly likely that the tag describes this place. excluding common place-related tags like #sea or #mall for each location, we keep only relevant tags for the following analysis. in other words, we get the list of tags which describe a normal state of particular places and their specific features. however, such tags cannot be indicators of events. fake transitions detection. the last stage of the pipeline is dedicated to suspicious posts identification. sometimes, people cannot share their thoughts or photos immediately. it leads to situations where even normal users have a bunch of posts, which are not accurate in terms of location and timestamp. at this stage, we exclude posts that cannot represent the right combination of their coordinates and timestamps. this process is similar to the ideas for location spoofing detection -we search for transitions, which someone could not make in time. the standard approach for detection of fake transitions is to use spacetime cones [ ] , but in this work, we suggest the improvement of this methodwe use isochrones for fake transitions identification. in urban studies, isochrone is an area that can be reached from a specified point in equal time. isochrone calculation is based on usage of real data about roads, that is why this method is more accurate than space-time cones. for isochrone calculation, we split the area into several zones depending on their distance from the observed point: pedestrian walking area (all locations in km radius), car/public transport area (up to km), train area ( - km) and flight area (further than km). this distinction was to define a maximum speed for every traveling distance. the time required for a specific transition is calculated by the following formula: where s i is the length of the road segment and v is the maximum possible velocity depending on the inferred type of transport. the road data was extracted from osm. it is important to note that on each stage of the pipeline, we get output data, which will be excluded, such as suspicious profiles, baseline tags, etc. however, this data can also be used, for example, for training novel models for fake accounts detection. the first experiment was designed to highlight the importance of general location extraction. to do that, we used the points-of-interest dataset for moscow, russia. the raw data was extracted from facebook using the places api and contained , places. the final dataset for moscow contained , places, and general sites were identified. however, it should be noted that among general locations, there were detected 'russia' ( , , visitors), 'moscow', 'russia' ( , , visitors) , 'moscow oblast' ( , visitors). for instance, the most popular non-general locations in moscow are sheremetyevo airport and red square, with only , and , check-ins, respectively. the itinerary construction is based on solving the orienteering problem with functional profits (opfp) with the help of the open-source framework fops [ ] . in this approach, locations are scored by their popularity and by farness distance. we used the following parameters for the ant colony optimization algorithm: ant per location and iterations of the algorithm, as it was stated in the original article. the time budget was set to h, the red square was selected as a starting point, and vorobyovy gory was used as a finish point since they two highly popular touristic places in the city center. the resulting routes are presented in fig. . both routes contain extra places, including major parks in the city: gorky park and zaryadye park. however, there are several distinctions in these routes. the route based on the raw data contains four general places (fig. , left) -'moscow', 'moscow, 'russia', 'russia', and 'khamovniki district', which do not correspond to actual places. thus, % of locations in the route cannot be visited in real life. in contrast, in case of the clean data (fig. , right) , instead of general places algorithm was able to add real locations, such as bolshoi theatre and central children's store on lubyanka with the largest clock mechanism in the world and an observation deck with the view on kremlin. thus, the framework was able to construct a much better itinerary without any additional improvements in algorithms or methods. to demonstrate the value of background analysis and typical hashtags extraction stages, we investigated the scenario of analysis of users' opinions in a geographical area via sentiment analysis. we used a combined dataset of twitter and vk.com posts taken in sochi, russia, during . sochi is one of the largest and most popular russian resorts. it was also the host of the winter olympics in . since twitter and vk.com provide geospatial data with exact coordinates, we created a squared grid with a cell size equal to m. we then kept only cells containing data (fig. , right) - cells in total. each cell was considered as a separate location for the context extraction. the most popular tags in the area are presented in fig. (left) . tag '#sochi' was mentioned in / of cells ( and cells for russian and english versions of the tag, respectively). the followup tags '#sochifornia' (used in cells) and '#sea' (mentioned in cells) were twice less popular. after that, we extracted typical tags for each cell. we considered a post to be relevant to the place if it contained at least one typical tag. thus, we can be sure that posts represent the sentiment in that area. the sentiment analysis was executed in two stages. first, we prepare the text for polarity detection. to do that, we delete punctuation, split text in words, and normalized text with the help of [ ] . in the second step, we used the russian sentiment lexicon [ ] to get the polarity of each word ( indicates positive word and − negative word). the sentiment of the text is defined as if a sum of polarities of all words more than zero and − if the sum is less than zero. the sentiment of the cell is defined as an average sentiment of all posts. on the fig. , results of sentiment analysis are presented, cells with average sentiment less than . were marked as neutral. it can be noted from maps that after the filtering process, more cells have a higher level of sentiment. for sochi city center, the number of posts with the sentiment |s| ≥ . increased by . %. it is also important that number of uncertain cell with sentiment rate . ≤ |s| ≤ . decreased by . % from to cells. thus, we highlighted the strong positive and negative areas and decreased the number of uncertain areas by applying the context extraction stage of the proposed pipeline. in this experiment, we applied the full pipeline on the instagram data. new york city was used as a target city in the event detection approach [ ] we collected the data from over , locations for a period of up to years. the total number of posts extracted from the new york city area is , , . in the first step, we try to exclude from the dataset all users who provide incorrect data, i.e. use several locations instead of the whole variety. we group users with the help of k-means clustering method. the appropriate number of clusters was obtained by calculating the distortion parameter. deviant cluster contained , users out of , , . the shape of deviant clusters can be seen in fig. . suspicious profiles mostly post in the same location. meanwhile, regular users have variety in terms of places. after that, we trained our rf model using manually labelled data from both datasets. the training dataset contains profiles with ordinary users and fake users; test data consists of profiles including normal profiles and suspicious accounts. the model distinguishes a regular user from suspicious successfully. normal user were detected correctly and users were marked as suspicious. suspicious users out of were correctly identified. thus, there were obtained % of precision and % of recall. since the goal of this work is to get clean data as a result, we are interested in a high value of recall and precision is less critical. as a result, we obtained a list of , , profiles which related to real people. at the next step, we used only data from these users to extract background information about cities. titles of general locations were derived for new york. these places were excluded from further analysis. after that, we extracted general hashtags; the example of popular tags in location before and after background tags extraction is presented on the fig. . general tags contain mostly different term related to toponyms and universal themes such as beauty or life. then, we performed the context extraction for locations. for each location typical hashtags were identified as % most frequent tags among users. we consider all posts from one user in the same location as one to avoid situations where someone tries to force their hashtag. we will use extracted lists to exclude typical tags from posts. after that, we calculated isochrones for each normal users to exclude suspicious posts from data. in addition to that, locations with a high rate of suspicious posts ( % or higher part of posts in location was detected as suspicious) were excluded as well. there was locations in new york city. the final dataset for new york consists of , locations. for event detection we performed the same experiment which was described in [ ] . in the original approach the spike in activity in particular cell of the grid was consider as an event. to find these spikes in data, historical grids is created using retrospective data for a calendar year. since we decrease amount of data significantly, we set threshold value to . we used data for to create grids, then we took two weeks from for the result evaluation: a week with a lot of events during - of march and an ordinary week with less massive events - february. the results of the recall evaluation are presented in table . as can be seen from the table on an active week, the recall increment was . % and for nonactive week recall value increase on . %. it is also important to note that some events, which do not have specific coordinates, such as snowfall in march or saint patrick's day celebration, were detected in the less number of places. this leads to lesser number of events in total and more significant contribution to the false positive rate. nevertheless, the largest and the most important events, such as nationwide protest '#enough! national school walkout' and north american international toy fair are still detected from the very beginning. in addition to that due to the altered structure of historical grids, we were able to discover new events such as a concert of canadian r&b duo 'dvsn', global engagement summit at un headquarters, etc. these events were covered with a low number of posts and stayed unnoticed during the original experiment. however, the usage of clean data helped to highlight small events which are essential for understanding the current situation in the city. in this work, we presented a spatiotemporal filtering pipeline for data preprocessing. the main goal of this process is to exclude unreliable data in terms of space and time. the pipeline consists of four stages: during the first stage, suspicious user profiles are extracted from data with the help of k-means clustering and random forest classifier. on the next stage, we exclude the buzz words from the data and filter locations related to large areas such as islands or city districts. then, we identify the context of a particular place expressed by unique tags. in the last step, we find suspicious posts using the isochrone method. stages of the pipeline can be used separately and for different tasks. for instance, in the case of touristic walking itinerary construction, we used only general location extraction, and the walking itinerary was improved by replacing % of places. in the experiment dedicated to sentiment analysis, we used a context extraction method to keep posts that are related to the area where they were taken, and as a result, . % of uncertain areas were identified either as neutral or as strongly positive or negative. in addition to that, for event detection, we performed all stages of the pipeline, and recall for event detection method increased by . %. nevertheless, there are ways for further improvement of this pipeline. in instagram, some famous places such as times square has several corresponding locations including versions in other languages. this issue can be addressed by using the same method from the general location identification stage. we can use distance to find places with a similar name. currently, we do not address the repeating places in the data since it can be a retail chain, and some retail chains include over a hundred places all over the city. in some cases, it can be useful to interpret a chain store system as one place. however, if we want to preserve distinct places, more complex methods are required. despite this, the applicability of the spatiotemporal pipeline was shown using the data from facebook, twitter, instagram, and vk.com. thus, the pipeline can be successfully used in various tasks relying on location-based social network data. deep" learning for missing value imputation in tables with non-numerical data right time, right place" health communication on twitter: value and accuracy of location information social media geographic information: why social is special when it goes spatial building sentiment lexicons for all major languages positional accuracy of twitter and instagram images in urban environments a location spoofing detection method for social networks compa: detecting compromised accounts on social networks the rise of social bots the quality of big (geo)data urban computing leveraging location-based social network data: a survey zooming into an instagram city: reading the local through social media an agnotological analysis of apis: or, disconnectivity and the ideological limits of our knowledge of social media advances in social media research: past, present and future morphological analyzer and generator for russian and ukrainian languages efficient pre-processing and feature selection for clustering of cancer tweets analyzing user activities, demographics, social network structure and user-generated content on instagram is the sample good enough? comparing data from twitter's streaming api with twitter's firehose orienteering problem with functional profits for multi-source dynamic path construction fake news detection on social media: a data mining perspective who is who on twitter-spammer, fake or compromised account? a tool to reveal true identity in real-time twitter as an indicator for whereabouts of people? correlating twitter with uk census data detecting spammers on social networks online humanbot interactions: detection, estimation, and characterization multiscale event detection using convolutional quadtrees and adaptive geogrids places nearby: facebook as a location-based social media platform arming the public with artificial intelligence to counter social bots detecting spam in a twitter network true lies in geospatial big data: detecting location spoofing in social media acknowledgement. this research is financially supported by the russian science foundation, agreement # - - . key: cord- -b vw r o authors: morales, alex; narang, kanika; sundaram, hari; zhai, chengxiang title: crowdqm: learning aspect-level user reliability and comment trustworthiness in discussion forums date: - - journal: advances in knowledge discovery and data mining doi: . / - - - - _ sha: doc_id: cord_uid: b vw r o community discussion forums are increasingly used to seek advice; however, they often contain conflicting and unreliable information. truth discovery models estimate source reliability and infer information trustworthiness simultaneously in a mutual reinforcement manner, and can be used to distinguish trustworthy comments with no supervision. however, they do not capture the diversity of word expressions and learn a single reliability score for the user. crowdqm addresses these limitations by modeling the fine-grained aspect-level reliability of users and incorporate semantic similarity between words to learn a latent trustworthy comment embedding. we apply our latent trustworthy comment for comment ranking for three diverse communities in reddit and show consistent improvement over non-aspect based approaches. we also show qualitative results on learned reliability scores and word embeddings by our model. users are increasingly turning to community discussion forums to solicit domain expertise, such as querying about inscrutable political events on history forums or posting a health-related issue to seek medical suggestions or diagnosis. while these forums may be useful, due to almost no regulations on post requirements or user background, most responses contain conflicting and unreliable information [ ] . this misinformation could lead to severe consequences, especially in health-related forums, that outweigh the positive benefits of these communities. currently, most of the forums either employ moderators to curate the content or use community voting. however, both of these methods are not scalable [ ] . this creates a dire need for an automated mechanism to estimate the trustworthiness of the responses in the online forums. in general, the answers written by reliable users tend to be more trustworthy, while the users who have written trustworthy answers are more likely to be reliable. this mutual reinforcement, also referred to as the truth discovery principle, is leveraged by previous works that attempt to learn information trustworthiness in the presence of noisy information sources with promising results [ , , , ] . this data-driven principle particularly works for community forums as they tend to be of large scale and exhibit redundancy in the posts and comments. community discussion forums usually encompass various topics or aspects. a significant deficiency of previous work is the lack of aspect-level modeling of a user's reliability. this heterogeneity is especially true for discussion forums, like reddit, with communities catering to broad themes; while within each community, questions span a diverse range of sub-topics. intuitively, a user's reliability will be limited to only a few topics, for instance, in a science forum, a biologist could be highly knowledgeable, and in turn reliable, when she answers biology or chemistry-related questions but may not be competent enough for linguistic queries. another challenge is the diversity of word expressions in the responses. truth discovery based approaches treat each response as categorical data. however, in discussion forums, users' text responses can include contextually correlated comments [ ] . for instance, in the context of a post describing symptoms like "headache" and "fever", either of the related responses of a viral fever or an allergic reaction can be a correct diagnosis. on the other hand, unrelated comments in the post should be unreliable; for instance, a comment giving a diagnosis of "bone fracture" for the above symptoms. crowdqm addresses both limitations by jointly modeling the aspect-level user reliability and latent trustworthy comment in an optimization framework. in particular, ) crowdqm learns user reliability over fine-grained topics discussed in the forum. ) our model captures the semantic meaning of comments and posts through word embeddings. we learn a trustworthy comment embedding for each post, such that it is semantically similar to comments of reliable users on the post and also similar to the post's context. contrary to the earlier approaches [ , , ] , we propose an unsupervised model for comment trustworthiness that does not need labeled training data. we verified our proposed model on the comment ranking task based on trustworthiness for three ask* subreddit communities. our model outperforms stateof-the-art baselines in identifying the most trustworthy responses, deemed by community experts and community consensus. we also show the effectiveness of our aspect-based user reliability estimation and word embeddings qualitatively. further, our improved model of reliability enables us to identify reliable users per aspect discussed in the community. a challenge in applying truth discovery to discussion forums is capturing the variation in user's reliability and the diversity of word usage in the answers. to address it, we model aspect-level user reliability and use semantic representations for the comments. each submission is a post, i.e., question, which starts a discussion thread while a comment is a response to a submission post. formally, each submission post, m, is associated with a set of terms, c m . a user, n, may reply with a comment on submission m, with a set of terms w m,n . v is the vocabulary set comprising of all terms present in our dataset i.e. all submissions and comments. each term, ω ∈ v has a corresponding word-vector representation, or word embedding, v ω ∈ r d . thus, we can represent a post embeddings in terms of its constituent terms, {v c }, ∀c ∈ c m . to capture the semantic meaning, we represent each comment as the mean word-vector representation of their constituent terms . formally, we represent the comment given on the post m by user n as the comment embeddings, a m,n = |w m,n | there are k aspects or topics discussed in the forum, and each post and comment can be composed of multiple aspects. we denote submission m's distribution over these aspects as the post-aspect distribution, p m ∈ r k . similarly, we also compute, user-aspect distribution, u n ∈ r k , learned over all the comments posted by the user n in the forum. this distribution captures familiarity (or frequency) of user n with each aspect based on their activity in the forum. each user n also has a user reliability vector defined over k aspects, r n ∈ r k . the reliability captures the likelihood of the user providing a trustworthy comment about a specific aspect. note high familiarity in an aspect does not always imply high reliability in the same aspect. for each submission post m associated with a set of responses {a m,n }, our goal is to estimate the real-valued vector representations, or latent trustworthy comment embeddings, a * m ∈ r d . we also simultaneously infer the user reliability vector {r n } and update the word embeddings {v ω }. the latent trustworthy comment embeddings, a * m , can be used to rank current comments on the post. our model follows the truth discovery principle: trustworthy comment is supported by many reliable users and vice-versa. in other words, the weighted error between the trustworthy comment and the given comments on the post is minimum, where user reliabilities provide the weight. we extend the approach to use an aspect-level user reliability and compute a post-specific reliability weight. we further compute the error in terms of the embeddings of posts and comments to capture their semantic meaning. in particular, we minimize the embedding error, e m,n = ||a * m − a m,n || , i.e., mean squared error between learned trustworthy comment embeddings, a * m and comment embeddings, a m,n , on the post m. this error ensures that the trustworthy comment is semantically similar to the comments given for the post. next, to ensure context similarity of the comments with the post, we compute the context error, q m,n = |c m | − c∈cm ||a m,n − v c || , reducing the difference between the comment embeddings and post embeddings. the key idea is similar to that of the distributional hypothesis that if two comments co-occur a lot in similar posts, they should be closer in the embedding space. further, these errors are weighted by the aspect-level reliability of the user providing the comment. we estimate the reliability of user n for the specific post m through the user-post reliability score, the symbol represents the hadamard product. this scores computes the magnitude of user reliability vector, r n , weighted by the similarity function s(.). the similarity function s(u n , p m ) captures user familiarity with post's context by computing the product of the aspect distribution of user n and post m. thus, to get a high user-post reliability score, r m,n , the user should both be reliable and familiar to the aspects discussed in the post. . an illustrative toy example detailing our model components. the left-hand side details the user-post reliability score estimation, rm,n, that is a function of similarity function s(.) between the user and post aspect distributions and user aspect reliabilities, rn. in the right-hand, we learn trustworthy comment embedding, a * m , such that it is similar to user comments, am,n which are, in turn, similar to the post context vc. finally, these errors are aggregated over all the users and their comments. thus, we define our objective function as follows, where n is the number of users. r m,n · e m,n ensures that the latent trustworthy comment embeddings are most similar to comment embeddings of reliable users for post m. while r m,n · q m,n ensures trust aware learning of contextualized comment embeddings. the hyperparameter β controls the importance of context error in our method. the exponential regularization constraint, n n= e −r (k) n = for each k, ensures that the reliability across users are nonzero. figure shows the overview of our model using a toy example of a post in a medical forum with flu-like symptoms. the commenters describing flu-related diagnoses are deemed more reliable for this post. we use coordinate descent [ ] to solve our optimization problem. in particular, we solve the equation for each variable while keeping the rest fixed. thus, the latent trustworthy comment is a weighted combination of comments where weights are provided by the user-post reliability score r m,n . alternatively, it can also be interpreted as a reliable summarization of all the comments. reliability of a user in aspect k is inversely proportional to the errors with respect to the latent trustworthy comment a * m (e m,n ) and submission's context v c (q m,n ) over all of her posted comments (m n ). the embedding error ensures that if there is a large difference between the user's comment and the trustworthy comment, her reliability becomes lower. the context error ensures that nonrelevant comments to the post's context are penalized heavily. in other words, a reliable user should give trustworthy and contextualized responses to posts. this error is further weighed by the similarity score, s(.), capturing familiarity of the user with the post's context. thus, familiar users are penalized higher for their mistakes as compared to unfamiliar users. where < m, n >∈ dω = {(m, n)|ω ∈ wm,n} and a −ω m,n = |wm,n| − ω ∈wm,n\{ω} v ω . to update v ω , we only consider those comment and submission pairs, d ω , in which the particular word appears. the update of the embeddings depend on the submission context v c , latent trustworthy comment embedding, a * m as well as user-post reliability score, r m,n . thus, word embeddings are updated in a trust-aware manner such that reliable user's comments weigh more than those of unreliable users as they can contain noisy text. note that there is also some negative dependency on the contribution of other terms in the comments. we used popular latent dirichlet allocation (lda) [ ] to estimate aspects of the posts in our dataset . specifically, we combined the title and body text to represent each post. we applied topic model inference to all comments of user n to compute its combined aspect distribution, u n . we randomly initialized the user reliability, r n . we initialized the word embeddings, v ω , via word vec [ ] trained on our dataset. we used both unigrams and bigrams in our model. we fixed β to . . the model converges after only about six iterations indicating quick approximation. in general, the computational complexity is o(|v|nm); however, we leverage the data sparsity in the comment-word usage and user-posts for efficient implementation. in this section, we first discuss our novel dataset, followed by experiments on the outputs learned by our model. in particular, we evaluate the trustworthy comment embeddings on the comment ranking task while we qualitatively evaluate user reliabilities and word embeddings. for brevity, we focus the qualitative analysis on our largest subreddit, askscience. we evaluate our model on a widely popular discussion forum reddit. reddit covers diverse topics of discussion and is challenging due to the prevalence of noisy responses. we specifically tested on ask* subreddits as they are primarily used to seek answers to a variety of topics from mundane issues to serious medical concerns. in particular, we crawled data from three subreddits, /r/askscience, /r/askhistorians, and /r/askdocs from their inception until october . while these subreddits share the same platform, the communities differ vastly, see table . we preprocessed the data by removing uninformative comments and posts with either less than ten characters or containing only urls or with a missing title or author information. we removed users who have posted less than two comments and also submissions with three or fewer comments. to handle sparsity, we treated all users with a single comment as "unk". for each submission post, there is an associated flair text denoting the category of the post, referred to as the submission flair that is either moderator added or self-annotated,e.g., physics, chemistry, biology. similarly, users have author flairs attributed next to their user-name describing their educational background, e.g., astrophysicist, bioengineering. only users verified by the moderator have author flairs, and we denote them as experts in the rest of the paper. askdocs does not have submission flairs as it is a smaller community. for both subreddits, we observed that around % of the users comment on posts from more than two categories. experts are highly active in the community answering around - % of the posts (table ) . askscience and askhistorians have significantly higher (fig. ) and more detailed comments (|w m,n | in table ) per post than askdocs. due to the prevalence of a large number of comments, manual curation is very expensive, thus necessitating the need for an automatic tool to infer comments trustworthiness. we evaluate latent trustworthy comment learned by our model on a trustworthy comment ranking task. that is, given a submission post, our goal is to rank the posted comment based on their trustworthiness. for this experiment, we treat expert users' comment as the most trustworthy comment of the post. besides, we also report results using the highest upvoted comment as the gold standard. highest upvoted comments represent community consensus on the most trustworthy response for the post [ ] . in particular, we rank comments for each post m, in the order of descending cosine similarity between their embedding, a m,n , and the latent trustworthy comment embeddings, a * m . we then report average precison@k values over all the posts, where k denotes the position in the output ranked list of comments. baselines: we compare our model with state-of-the-art truth discovery methods proposed for continuous and text data and non-aspect version of our model . in this baseline, we represent the trustworthy comment for a post as the mean comment embedding and thus assume uniform user reliability. crh : is a popular truth discovery-based model for numerical data [ ] . crh minimizes the weighted deviation of the trustworthy comment embedding from the individual comment embeddings with user reliabilities providing the weights. catd: is an extension of crh that learns a confidence interval over user reliabilities to handle data skewness [ ] . for both the above models, we represent each comment as the average word embeddings of its constituent terms. trustanswer : li et al. [ ] modeled semantic similarity between comments by representing each comment with embeddings of its key phrase. crowdqm-no-aspect: in this baseline, we condense the user's aspect reliabilities to a single r n . this model acts as a control to gauge the performance of our proposed model. table a reports the precision@ results using expert's comments as the gold standard. mboa, with uniform source reliability, outperforms the crh method that estimates reliability for each user separately. thus, simple mean embeddings provide a robust representation for the trustworthy comment. we also observe that crowdqm-no-aspect performs consistently better than trustanswer. note that both approaches do not model aspect-level user reliability but use semantic representations of comments. however, while trustanswer assigns a single reliability score for each comment, crowdqm-no-aspect additionally takes into account the user's familiarity with the post's context (similarity function, s(.)) to compute her reliability for the post. finally, crowdqm consistently outperforms both the models, indicating that aspect modeling is beneficial. catd uses a confidence-aware approach to handle data skewness and performs the best among the baselines. this skewness is especially helpful in reddit as experts are the most active users (table ) ; and, catd likely assigns them high reliability. our model achieves competitive precision as catd for askdocs while outperforming for the others. this indicates that our data-driven model works better for communities which are less sparse (sect. . and fig. ). table . precision@ for all three ask* subreddits, with ( a) the experts' comments and ( b) upvotes used to identify trustworthy comments. table b reports precision@ results using community upvoted comments as the gold standard, while fig. a plots the precision values against the size of the output ranked comment list. in general, there is a drop in performance for all models on this metric because it is harder to predict upvotes as they are inherently noisy [ ] . trustanswer and crowdqm-no-aspect perform best among the baselines indicating that modeling semantic representation is essential for forums. crowdqm again consistently outperforms the non-aspect based models verifying that aspect modeling is needed to identify trustworthy comments in forums. crowdqm remains competitive in the smaller askdocs dataset, where the best performing model is moba. thus, for askdocs, the comment summarizing all other comments tends to get the highest votes. parameter sensitivity. in fig. b , we plot our model's precision with varying number of aspects. although there is an optimal range around aspects, the precision remains relatively stable indicating that our model is not sensitive to aspects. we also did similar analysis with β and did not find any significant changes to the precision. we evaluate learned user reliabilities through users commenting on a post with a submission flair. note that a submission flair is manually curated and denotes post's category, and this information is not used in our model. specifically, for each post m, we compute the user-post reliability score, r m,n , for every user n who commented on the post. we then ranked these scores for each category and report top author flairs for few categories in table . the top author flairs for each category are domain experts. for instance, for the computing category highly reliable users have author flairs like software engineering and machine learning, while for linguistics authors with flairs hispanic sociolinguistics and language documentation rank high. these results align with our hypothesis that in-domain experts should have higher reliabilities. we also observe out of domain authors with flairs like comparative political behavior and nanostructured materials in the linguistic category. this diversity could be due to the interdisciplinary nature of the domain. our model, thus, can be used by the moderators of the discussion forum to identify and recommend potential reliable users to respond to new submission posts of a particular category. to further analyze the user reliability, we qualitatively examine the aspects with the largest reliability value of highly upvoted users in a post category. first, we identify users deemed reliable by the community for a category through a karma score. category-specific user karma is given by the average upvotes the user's comments have received in the category. we then correlate the categoryspecific user karma with her reliability score in each k ∈ k aspect, r (k) n to identify aspects relevant for that category. figure shows the top words of the highest correlated aspects for some categories. the identified words are topically relevant thus our model associates aspect level user reliability coherently. interestingly, the aspects themselves tend to encompass several themes, for example, in the health category, the themes are software and health. the crowdqm model updates word embeddings to better model semantic meaning of the comments. for each category, we identify the frequent terms and find its most similar keywords using cosine distance between the learned word embeddings. air subject food starts rolling mechanics brain production fire itself material "yes" then complete antimatter galaxies mathematical "dark" matter size the left column for each term in table are the most similar terms returned by the initial embeddings while the right column reports the results from updated embeddings {v ω } from our crowdqm model. we observe that there is a lot of noise in words returned by the initial model as they are just co-occurrence based while words returned by our model are semantically similar and describe similar concepts. this improvement is because our model updates word embeddings in a trust aware manner such that they are similar to terms used in responses from reliable users. our work is related to two main themes of research, truth discovery and community question answering (cqa). truth discovery: truth discovery has attracted much attention recently. different approaches have been proposed to address different scenarios [ , , ] . most of the truth discovery approaches are tailored to categorical data and thus assume there is a single objective truth that can be derived from the claims of different sources [ ] . faitcrowd [ ] assumes an objective truth in the answer set and uses a probabilistic generative model to perform fine-grained truth discovery. on the other hand, wan et al. [ ] propose trustworthy opinion discovery where the true value of an entity is modeled as a random variable with a probability density function instead of a single value. however, it still fails to capture the semantic similarity between the textual responses. some truth discovery approaches also leverage text data to identify correct responses effectively. li et al. [ ] proposed a model for capturing semantic meanings of crowd provided diagnosis in a chinese medical forum. zhang et al. [ ] also leveraged semantic representation of answers and proposed a bayesian approach to capture the multifactorial property of text answers. these approaches only use certain keywords to represent each answer and are thus, limited in their scope. also, they learn a scalar user reliability score. to the best of our knowledge, there has been no work that models both fine-grained user reliability with semantic representations of the text to discover trustworthy comments from community responses. community question answering: typically cqa is framed as a classification problem to predict correct responses for a post. most of the previous work can be categorized into feature-based or text relevance-based approaches. feature-driven models [ , , ] extract content or user based features that are fed into classifiers to identify the best comment. cqarank leverages voting information as well as user history and estimates user interests and expertise on different topics [ ] . barron-cedeno et al. [ ] also look at the relationship between the answers, measuring textual and structural similarities between them to classify useful and relevant answers. text-based deep learning models learn an optimal representation of question and answer pairs to identify the most relevant answer [ ] . in semeval task on cqa, nakov et al. [ ] developed a task to recommend related answers to a new question in the forum. semeval further extends this line of work by proposing fact checking in community question answering [ ] . it is not only expensive to curate each reply manually to train these models, but also unsustainable. on the contrary, crowdqm is an unsupervised method and thus does not require any labeled data. also, we estimate the comments' trustworthiness that implicitly assumes relevance to the post (modeled by these works). we proposed an unsupervised model to learn a trustworthy comment embedding from all the given comments for each post in a discussion forum. the learned embedding can be further used to rank the comments for that post. we explored reddit, a novel community discussion forum dataset for this task. reddit is challenging as posts typically receive a large number of responses from a diverse set of users and each user engages in a wide range of topics. our model estimates aspect-level user reliability and semantic representation of each comment simultaneously. experiments show that modeling aspect level user reliability improves the prediction performance compared to the non-aspect version of our model. we also show that the estimated user-post reliability can be used to identify trustworthy users for particular post categories. finding high-quality content in social media thread-level information for comment classification in community question answering nonlinear programming. athena scientific latent dirichlet allocation structural normalisation methods for improving best answer identification in question answering communities integrating conflicting data: the role of source dependence corroborating information from disagreeing views widespread underprovision on reddit which answer is best?: predicting accepted answers in mooc forums crowdsourced data management: a survey a confidence-aware approach for truth discovery on long-tail data resolving conflicts in heterogeneous data by truth discovery and source reliability estimation crowdsourcing high quality labels with a tight budget reliable medical diagnosis from crowdsourcing: discover trustworthy answers from non-experts a survey on truth discovery what we vote for? answer selection from user expertise view in community question answering faitcrowd: fine grained truth discovery for crowdsourced data aggregation semeval- task : fact checking in community question answering distributed representations of words and phrases and their compositionality truthcore: non-parametric estimation of truth from a collection of authoritative sources semeval- task : community question answering from truth discovery to trustworthy opinion discovery: an uncertainty-aware quantitative modeling approach sentence similarity learning by lexical decomposition and composition hybrid attentive answer selection in cqa with deep users modelling cqarank: jointly model topics and expertise in community question answering truth discovery with multiple conflicting information providers on the web texttruth: an unsupervised approach to discover trustworthy information from multi-sourced text data a probabilistic model for estimating real-valued truth from conflicting sources truth inference in crowdsourcing: is the problem solved? key: cord- -k upc xu authors: sanz-cruzado, javier; macdonald, craig; ounis, iadh; castells, pablo title: axiomatic analysis of contact recommendation methods in social networks: an ir perspective date: - - journal: advances in information retrieval doi: . / - - - - _ sha: doc_id: cord_uid: k upc xu contact recommendation is an important functionality in many social network scenarios including twitter and facebook, since they can help grow the social networks of users by suggesting, to a given user, people they might wish to follow. recently, it has been shown that classical information retrieval (ir) weighting models – such as bm – can be adapted to effectively recommend new social contacts to a given user. however, the exact properties that make such adapted contact recommendation models effective at the task are as yet unknown. in this paper, inspired by new advances in the axiomatic theory of ir, we study the existing ir axioms for the contact recommendation task. our theoretical analysis and empirical findings show that while the classical axioms related to term frequencies and term discrimination seem to have a positive impact on the recommendation effectiveness, those related to length normalization tend to be not desirable for the task. with the large-scale growth of social network platforms such as twitter or facebook, recommender systems technology that targets explicit social scenarios has seen a surge of interest [ , ] . as part of this trend, the adaptation of information retrieval (ir) approaches to recommend people to connect to in the network have been particularly studied [ , ] . this specific class of recommender systems has the interesting property that users play a dual role: they are the users to whom we want to provide recommendations, but they are also the items we want to recommend [ ] . recently, it has been shown that classical ir weighting models -such as bm -can not only be used, but are also effective and efficient for the contact recommendation task [ ] . in fact, recommender systems have always had strong connections with textual information retrieval (ir), since both tasks can be considered as particular cases of information filtering [ ] . these ties have been materialized in the design and development of recommendation approaches based on ir models [ , , ] . content-based recommender systems [ ] have been the most direct realization of such ties. however, we also note the collaborative filtering methods of [ , ] , which employed the vector space model or query likelihood to their advantage. in this paper, we analyze the reasons behind the effectiveness of ir approaches for the task of recommending contacts in social networks, through an exploratory analysis of the importance and validity of the fundamental ir axioms [ ] . we start our analysis by examining contact recommendation methods that directly adapt ir models [ ] , as they provide a bridge between existing work on axiomatic analysis in ir models, and this new task. in particular, we empirically analyze whether satisfying the ir axioms leads to an increase in the performances of the algorithms. interestingly, we find that while this is generally true, the axioms related to length normalization negatively impact the contact recommendation performance, since they interfere with a key evolutionary principle in social networks, namely preferential attachment [ ] . by identifying the set of properties that an ir model must (at least) follow to provide effective results, axiomatic thinking as developed by fang et al. [ ] has permitted to guide the development of both sound and effective ir approaches by explaining, diagnosing and improving them. in their seminal work, fang et al. [ ] proposed several heuristics (known as axioms) addressing different properties of the models such as the frequency of the query terms in the retrieved documents, the relative discrimination between query terms, or how a model deals with long documents. they also analyzed the effect such properties had on the effectiveness of state-of-the-art models such as bm [ ] or query likelihood [ ] , and found that, with minor modifications to adhere to the different proposed axioms, the modified ir models achieved an improved retrieval performance. since the seminal work of fang et al., the original axioms have been refined and expanded [ , ] , and other additional properties of effective ir models have been studied, such as the semantic relations between queries and documents [ ] or term proximity [ ] . recently, axiomatic analysis has been applied on neural ir models: rennings et al. [ ] proposed a method for empirically checking if the learned neural models fulfil the different ir axioms, while rosset et al. [ ] used the axioms as constraints for guiding the training of neural models. beyond ir, axiomatic analysis has also expanded to other areas such as recommender systems, where valcarce et al. [ , ] explored the benefits of penalizing users who rate lots of items when selecting neighbors in user-based knn approaches. in this paper, using the ir-based contact recommendation framework proposed by sanz-cruzado and castells [ ] as a basis, we map the ir axioms of fang et al. [ ] into the task of recommending people in social networks, and empirically analyze how valid and meaningful each axiom is for this task. we first introduce the notations we use during the rest of the paper. given a social network, we represent its structure as a graph g = u, e , where u denotes the set of people in the network and e is the set of relationships between users. for each user u ∈ u, we denote by Γ (u) the set of users with whom u has established relationships (the neighborhood of user u). in directed networks, three different neighborhoods can be considered depending on the link orientation: users who have a link towards u, Γ in (u); users towards whom u has a link, Γ out (u) ; and the union of both, Γ und (u). we define Γ inv (u) as the inverse neighborhood of u, i.e. the neighborhood u would have if the orientation of the links is reversed. weighted networks additionally include a function w : unweighted networks can be seen as a particular case where w : u → { , }. then, given a target user u, the contact recommendation task consists of suggesting a subset of usersΓ out (u) ⊂ u \Γ out (u) towards whom u has no links but who might be of interest for u. we define the recommendation task as a ranking problem, in which the result setΓ out (u) is obtained and sorted by a ranking function f u : u \ Γ out (u) → r. since we explore the importance of ir axioms for contact recommendation, we need to establish connections between both tasks. we take for this purpose the mapping proposed in [ ] : we fold the three spaces in the ir task (documents, queries and terms) into a single space for people to people recommendation, namely the users in the network. we map queries and documents to the target and candidate users, respectively. we also use the neighbors of both target and candidate users as equivalent to the terms contained in the queries and documents. as proposed by sanz-cruzado and castells [ ] , we might use different neighborhoods to represent the target and candidate users (we could take either Γ in , Γ out or Γ und for each of them). we denote by Γ q (u) the neighborhood representing the target user, and by Γ d (v) the one for the candidate user. the frequency of a term t in a document is represented as an edge weight w d (v, t) in our mapping: where x is equal to one when the condition x is true, or otherwise. in textual ir, the frequency is the basis to establish a measure of how important a term is for a document, and it is always positive. therefore, we assume that w d ≥ , and w d (v, t) = if and only if t / ∈ Γ d (v). the higher the importance of the link (v, t), the higher the weight w d (v, t) should be. in our experiments (described in sect. ), we use the number of interactions (i.e. retweets, mentions) between users as an example definition of w d (v, t). in those network datasets where this type of information is not available, we simply use binary weights. finally, the document length is mapped to the sum of the weights of the neighborhood of the target user: len(v) = t∈Γ l (v) w l (v, t), which can be seen as a generalized notion of vertex degree in the social graph. for some methods (such as bm [ ] ), we may consider a different neighborhood orientation when computing the user "size"; this explains the different symbols Γ l , w l (not necessarily equal to Γ d , w d ) in the definition of len(v). in this framework, as the ir models rely on common neighbors between the target and the candidate user, they can only recommend people at distance . table summarizes the relation between the ir and contact recommendation tasks. further details about the mapping are described in [ ] . before analyzing the importance of the ir axioms in the recommendation task, we first recall the ir axioms, and reformulate them using the mapping from ir to contact recommendation. in the remainder of this section, we take the seven axioms proposed by fang et al. [ ] , divided into four categories, and analyze them. the first family of axioms analyzes the role of the frequency of the query terms in the retrieved documents. since term frequencies are represented as edge weights in our framework, we rename them as "edge weight constraints" (ewc) in our reformulation. the first constraint, tfc , establishes that if the only difference between two documents is the frequency of a query term, then, the document with the higher term frequency should be ranked atop of the other. the intuition behind this axiom is naturally translated to contact recommendation by considering the "common friends" principle in social bonding: all things being equal, you are more likely to connect to people who have stronger bonds to common friends. this principle can be expressed as follows: ewc : if the target user u has a single neighbor Γ q (u) = {t}, and we have two different candidate users v , v such that len(v ) = len(v ), and the second term frequency constraint (tfc ) establishes that the ranking score increment produced by increasing term frequency should decrease with the frequency (i.e. ranking scores should have a dampened growth on term frequency, as in a diminishing returns pattern). this also has a direct meaning in the contact recommendation space: the difference in scores between two candidate contacts should decrease with the weights of their common friends with the target user. formally, this constraint is expressed as: ewc : for a target user u with a single neighbor Γ q (u) = {t}, and three finally, the third axiom reflects the following property: occurrence frequencies and discriminative power being equal, the document that covers more distinct query terms should attain a higher score. in people recommendation, this translates to the triadic closure principle [ , ] : all other things being equal, the more common friends a candidate contact has with the target user, the higher the chance that a new link between them exists. formally: . where td(t) is a measure of the informativeness of the common neighbors of the target and candidate users, as can be obtained from an idf measure. these three axioms are interdependent: if we take Γ q (u) = {t} and we fix the values for td(t) and len(v), we could rewrite f u (v) as a function of the document is strictly subadditive. given a function g, g positive and concave ⇒ g is increasing and subadditive. therefore, for such functions (as is the case for most of the classic ir functions), ewc ⇒ ewc ∧ ewc . however, if ewc is not satisfied, either ewc or ewc could still be satisfied. the term discrimination constraint is an axiom that formalizes the intuition that penalizing popular words in the collection (such as stopwords) and assigning higher weights to more discriminative query terms should produce better search results. this principle makes sense in contact recommendation: sharing a very popular and highly connected friend (e.g. two people following katy perry on twitter) may be a rather weak signal to infer that these two people would relate to each other. a less social common friend, however, may suggest the two people may indeed have more interests in common. this idea is in fact reflected in some contact recommendation algorithms such as adamic-adar [ , ] . hence, we rename the axiom as "neighbor discrimination constraint" (ndc), and we adapt the version of the axiom proposed by shi et al. [ ] , which simplifies the translation to our domain, as follows: the third family of ir axioms studies how algorithms should deal with the length of the documents. as defined in sect. , in our mapping, the length of the document is translated to the sum of the edge weights between the candidate user and its neighbors: len(v). as we only study the length of the candidate user, we will rename this family of constraints as "candidate length normalization constraints" (clnc). fang et al. [ ] proposed two different lncs. the first axiom states that for two documents with the same query term occurrence frequency, we should choose the shorter one, since it contains the least amount of query-unrelated information. in contact recommendation, this means penalizing popular, highly connected candidate users with many neighbors not shared with the target user. we hence reformulate this axiom as: clnc : given a target user u and two candidate users v the second constraint aims to avoid over-penalizing long documents: it states that if a document is concatenated to itself multiple times, the resulting document should not get a lower score than the original. in contact recommendation, this means that, if we multiply all the edge weights of a candidate user by a positive number, the score for the candidate user should not decrease. formally: for all users x and some constant k > , and w d (v , t) > for some neighbor t ∈ Γ q (u) of the target user u, then we have f u (v ) ≥ f u (v ). the last heuristic aims to provide a balance between query term frequency in documents and length normalization. the axiom states that if we add more occurrences of a query term to a document, its retrieval score should increase. for contact recommendation, the intuition is similar: if the link weight between two users v and t increases, then v's score as a candidate for target users having t in their neighborhood should increase. this axiom is then expressed as follows: ew-clnc: given a target user u with a single neighbor Γ q (u) = {t}, if two candidates v and v are such that w d (v , t) > w d (v , t) and len(v ) = len the first step to undertake an analysis of the ir axioms in contact recommendation is to determine the set of algorithms for which the different axioms are applicable, and, for those, to identify which constraints they satisfy and under which conditions. in this section, we provide an overview of different contact recommendation methods and their relation with the axioms. we divide the approaches into two groups: friends of friends approaches, which only recommend people at network distance from the target user, and methods which might recommend more distant users. the first group includes all ir models, as well as other approaches such as the most common neighbors (mcn) and adamic-adar's approach [ ] , whereas the second group includes matrix factorization [ , ] , random walk-based methods [ , ] and knn [ ] . the proposed set of constraints is not applicable to the algorithms in the second group, since the constraints are based on the idea that the weighting functions depend on the common users between the target and the candidate users. therefore, in the rest of the article, we focus on the algorithms in the first family. as future work, we envisage the formulation of new constraints tailored for algorithms that recommend users at distance greater than , possibly as a generalization of the set of constraints we study in this paper (see e.g. the formal analysis of pseudo-relevance feedback by clinchant and gaussier [ ] , which in our mapping would correspond to distance greater than ). we start analyzing the friends of friends methods by studying the ir models. in the adaptation of these models by sanz-cruzado and castells [ ] , the components of the ranking functions (frequency/weight, discriminative power functions, document/user length) maintain the basic properties on which the formal analysis by fang et al. [ , ] has relied. therefore, the adapted methods satisfy the same constraints in the social network as those satisfied in the text ir space, and, if they are only satisfied under certain conditions, we can find the new conditions just by adapting them for the contact recommendation task. then, models like pl [ , ] , the pivoted normalization vector space model (vsm) [ ] query likelihood with dirichlet (qld) [ ] or jelinek-mercer smoothing (qljm) [ ] keep their original properties in this new space. we find however one point of difference related to a possibility considered by sanz-cruzado and castells in the definition of the candidate user length; namely, that we can define the length of the candidate users by selecting a different neighborhood Γ l (v) than the one used for defining the candidate user, Γ d (v), as explained in sect. . as the only difference between the original and the version of bm defined by sanz-cruzado and castells is just the definition of the candidate length, it is straightforward to prove that all edge weight constraints and ndc are satisfied in the same way as they are for textual ir: ndc is unconditionally true, whereas all ewc axioms depend just on the condition: which, in contact recommendation, is likely to be true -indeed, as of , twitter has > m users, and, the most followed user has just m followers. on the other hand, differences arise when we study the constraints involving length normalization: clncs and ew-clnc. if we keep the same orientation for the user length and neighborhood selection for the candidate user, the mapping maintains the same components as the original ranking function, and, consequently, the condition for satisfying the three axioms is the same as the original: satisfying condition c . however, if the orientation for the length is changed, it is easy to show that, for clnc , bm satisfies the axiom if both conditions c and c are true, or both are false, where: and, for the ew-clnc, the constraint is kept if conditions c and c are met, or none of them are, where: the only length normalization-related constraint that is satisfied under the same conditions as the original bm model is the clnc constraint, since it does not really depend on the definition of user length. table shows the differences between the original version and this adaptation of the bm model for contact recommendation. hence, we introduce a new ir-based approach, namely the extreme bm (ebm ) method, a variant of bm where we make the k parameter tend to infinity. in comparison with bm , all constraints are satisfied under the conditions specified for bm , except ewc and ewc , which are not satisfied at all for ebm . in the bm model, under the conditions of ewc , the k parameter establishes how f u (v) grows as a function of the weight of the only common neighbor between the target and candidate users. the greater the value of k, the more the growth function approximates a linear function. when k → ∞, the growth becomes linear, and as a consequence, the model does not meet the ewc constraint. a similar issue occurs with ewc . beyond the ir models, other approaches such as adamic-adar or mcn do operate at distance . in the particular case of these methods, they consider neither weights nor any means of normalization; only ewc and clnc are applicable here: under the conditions of ewc , both methods just measure the number of common neighbors, satisfying the constraint. for clnc , if we multiply all the weights of the link for a candidate by any number k = , the score of the functions would not vary (and, consequently, they meet the axiom). we summarize this analysis in table , where we identify whether a method satisfies (fully or conditionally) or not the different axioms. in the case of the models not described in this section (pivoted normalization vsm, pl , qld), we refer to the article by fang et al. [ ] for further information on the conditions to satisfy the axioms. next, we empirically analyze whether satisfying the axioms leads to an improvement of the performance of such algorithms. prior work on axiomatic thinking [ , ] has analyzed to which extent the satisfaction of a suitable set of constraints correlates with effectiveness. this is also a mechanism to validate such constraints, showing that it is useful to predict, explain or diagnose why an ir system is working well or badly. taking up this perspective, we undertake next such an empirical analysis of constraints in the contact recommendation setting, using a set of friends-of-friends algorithms. data: we use different network samples from twitter and facebook: the ego-facebook network released in the stanford large network dataset collection [ ] , and two twitter data downloads described in [ ] as -month and -tweets. the twitter downloads include each two different sets of edges for the same set of users: the follow network (where (u, v) ∈ e if u follows v), and the interaction network (where (u, v) ∈ e if u retweeted or mentioned v). the datasets are described in more detail in [ ] [ ] [ ] . for evaluation purposes, we partition each network into a training graph that is supplied as input to the recommendation algorithms, and a test graph that is held out for evaluation. using the test graph, ir metrics such as precision, recall or ndcg can be computed, as well as other accuracy metrics such as auc [ ] , by considering test edges as binary relevance judgements: a user v is relevant to a user u if -and only if -the edge (u, v) appears in the test graph. we further divide the training graph into a smaller training graph and a validation graph for parameter tuning. table shows the size of the different resulting subgraphs. for all twitter networks, temporal splits are applied: the training data includes edges created before a given time, and the test set includes links created afterwards. edges appearing in both sides of the split are removed from the test network. for the interaction network, two different temporal points are selected to generate the split: july th and july th in the -month dataset, and july th and july th in -tweets. weights for the training graphs were computed by counting the number of interactions before the splits. for the follow networks, the edges between the users of the interaction network were downloaded three times: the first download is used as training graph for parameter tuning; the new links in the second snapshot (not present in the initial one), downloaded four months later, are used as the validation set; the complete second snapshot is given as input to the recommendation algorithms under evaluation; finally, the new edges in the third download (not present in the second), obtained two years afterwards, are used as the test data for evaluation. for the facebook data, since temporal information is not available, we apply a simple random split: % of links are sampled as training and % as test; within the training data, we use % of the edges as the validation subset. we focus on contact recommendation approaches that recommend users at distance . from that set, as representative ir models, we include adaptations for the pivoted normalization vector space model [ ] ; bir and bm [ ] as probabilistic models based on the probability ranking principle; query likelihood [ ] with jelinek-mercer [ ] , dirichlet [ ] and laplace [ ] smoothing as language models; and pl [ , ] , dfree, dfreeklim [ ] , dph [ ] and dlh [ ] as divergence from randomness approaches. in addition, we include adaptations of a number of link prediction methods [ ] (following [ ] ): adamic-adar [ ] , jaccard [ ] , most common neighbors [ ] and cosine similarity [ ] . we start by analyzing the edge weight constraints. since weights are binary in the twitter follow graphs and facebook, we focus here on interaction graphs, where the interaction frequency provides a natural basis for edge weighting. a first natural question that arises when we study these axioms is whether the weights are useful or not for providing good recommendations. this is equivalent to test the importance of the first axiom for the contact recommendation task. to answer that question, we compare the two options (binarized vs. not binarized weights) in all algorithms which make use of weights: cosine similarity between users and all the ir models except bir. we show the results in fig. (a) , where each dot represents a different approach. in the x axis, we show the ndcg@ value for the unweighted approaches, whereas the y axis shows ndcg@ for the weighted ones. we can see that using weights results in an inferior performance in all algorithms except for bm and the simple cosine similarity. these observations suggest that ewc does not appear to be a reliable heuristic for contact recommendation in networks. however, once the weight is important for a model (and, therefore, ewc is important) does satisfying the rest of the edge weight constraints provide more accurate recommendations? to check that, similarly to fang et al. [ , ] , we compare an algorithm that satisfies all three ewcs (and benefits from weights) with another one that does not satisfy ewc and ewc : we compare bm vs. ebm . fixing the k parameter for the bm model (using the optimal configuration from our experiments), we compare different parameter configurations for bm and ebm . results are shown in fig. (b) , where every dot in the plot corresponds to a different model configuration, the x axis represents the ndcg@ values for bm , and the y axis those of the ebm model. as it can be observed, ebm does not improve over bm for almost every configuration (dots are all below the y = x plane), thus showing that, as long as ewc is important for the model, both ewc and ewc are relevant. as explained in sect. , ewc can also be satisfied independently of ewc and ewc , so we finally check its importance. for that purpose, we address the following question: for any friends-of-friends algorithm, such as adamic-adar [ ] or the ir models, is it beneficial to reward the number of common users between the target and the candidate users? to analyze this, we compare the mcn approach (which satisfies the constraint) with a binarized version of mcn which returns all people at distance regardless of the common neighbor count. restricting the test set to people at distance , table shows the resulting auc [ ] of the mcn algorithm, averaged over users on each network. under these conditions, the binarized version would have an auc value of . . hence, our results show that the number of common neighbors seem to be a strong signal for providing accurate recommendations (and, therefore, ewc seems to be important on its own for the contact recommendation task). neighbor discrimination constraint (ndc): as previously explained, this constraint suggests penalizing highly popular common neighbors. in ir approaches, this constraint is satisfied or not depending on the presence or absence of a term discrimination element (such as the robertson-spärck-jones in bm /ebm or the p c (t) term in query likelihood approaches). therefore, to check the effectiveness benefit of this axiom, we compare -in terms of ndcg@ -the bm , ebm , qld, qljm and the pivoted normalization vsm models with variants of them that lack term discrimination. figure shows the difference between different variants of each model. in the figure, a positive value indicates that the original version (with term discrimination) performs better. we observe that in an overwhelming majority of points the original versions achieve a better accuracy, hence ndc appears to be key to providing good contact recommendations. this confirms the hypothesis in many recommendation approaches that using high-degree users to discriminate which users are recommended does not seem to be a good idea [ , ] . finally, we study the effect of normalizing by candidate user length. for that purpose, similarly to the previous section, we compare the bm , ebm , qljm, qld and the pivoted normalization vsm models with versions of the models lacking the normalization by the candidate user length (which do not satisfy clnc and ew-clnc) using ndcg@ . we show a graph showing the differences in accuracy between different variants of the algorithms in fig. (a) . since there are few differences between datasets, we only show results for the interactions network of the twitter -month dataset. in the figure, we observe an opposite trend to what was expected: instead of performing worse, the algorithms without normalization do improve the results. therefore, it seems that the different length normalization constraints are not useful for contact recommendation. these observations are consistent with the preferential attachment phenomenon in social networks [ ] , whereby high-degree users are more likely to receive new links than long-tail degree users. as an example, we check this in fig. (b) , where we compare the performances of the recommendation approaches listed in section . with the average in-degree, out-degree and (undirected) degree of the recommended people. we observe that, in general, in-degree and degree are clearly correlated with the performances of the methods, as the principle indicates. with out-degree this is not so clear though. this explains the few configurations in fig. (a) that do not improve when we remove the normalization: all of them normalize by the sum of the weights of the outgoing links of the candidate users. similar trends are observed in other networks. we have theoretically and empirically analyzed the importance of the fundamental ir axioms for the contact recommendation task in social networks. theoretically, we have translated the different axioms proposed in [ ] to the contact recommendation task, and we have checked whether the mapping introduced in [ ] is sound and complete. we have found that, in general, the properties of the ir models are held in the recommendation task when we apply this mapping, unless we use a different definition for the document length from the usual. empirically, we have conducted several experiments over various twitter and facebook networks to check if those axioms have any positive effect on the accuracy of the recommenders. we showed that satisfying the constraints related to term frequencies and term discrimination have a positive impact on the accuracy. however, those related to length normalization tend to have the opposite effect, as they interfere with a basic evolutionary principle of social networks, namely preferential attachment [ ] . friends and neighbors on the web toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions probability information models for retrieval based on divergence from randomness frequentist and bayesian approach to information retrieval fub, iasi-cnr and university of tor vergata at trec blog track fub, iasi-cnr, univaq at trec microblog track probabilistic models of information retrieval based on measuring the divergence from randomness emergence of scaling in random networks information filtering and information retrieval: two sides of the same coin? bridging memory-based collaborative filtering and text retrieval a theoretical analysis of pseudo-relevance feedback models a formal study of information retrieval heuristics diagnostic evaluation of information retrieval models semantic term matching in axiomatic approaches to information retrieval an introduction to roc analysis the whoto-follow system at twitter: strategy, algorithms, and revenue impact recommending twitter users to follow using content and collaborative filtering approaches collaborative filtering for implicit feedback datasets Étude comparative de la distribution florale dans une portion des alpes et des jura interpolated estimation of markov source parameters from sparse data matrix factorization techniques for recommender systems the link-prediction problem for social networks a hierarchical dirichlet language model learning to discover social circles in ego networks clustering and preferential attachment in growing networks networks: an introduction, st edn a language modeling approach to information retrieval an axiomatic approach to diagnosing neural ir models the probabilistic relevance framework: bm and beyond an axiomatic approach to regularizing neural ranking models a vector space model for automatic indexing contact recommendations in social networks enhancing structural diversity in social networks by recommending weak ties information retrieval models for contact recommendation in social networks gravitation-based model for information retrieval proceedings of the th text retrieval conference (trec ) social recommendation: a review an exploration of proximity measures in information retrieval axiomatic analysis of language modelling of recommender systems finding and analysing good neighbourhoods to improve collaborative filtering. knowl.-based syst algorithms for estimating relative importance in networks a study of smoothing methods for language models applied to information retrieval predicting missing links via local information acknowledgements. j. sanz-cruzado and p. castells were partially supported by the spanish government (tin - -p). c. macdonald and i. ounis were partially supported by the european community's horizon programme, under grant agreement n ō entitled bigdatastack. key: cord- -b f wtfn authors: caldarelli, guido; nicola, rocco de; petrocchi, marinella; pratelli, manuel; saracco, fabio title: analysis of online misinformation during the peak of the covid- pandemics in italy date: - - journal: nan doi: nan sha: doc_id: cord_uid: b f wtfn during the covid- pandemics, we also experience another dangerous pandemics based on misinformation. narratives disconnected from fact-checking on the origin and cure of the disease intertwined with pre-existing political fights. we collect a database on twitter posts and analyse the topology of the networks of retweeters (users broadcasting again the same elementary piece of information, or tweet) and validate its structure with methods of statistical physics of networks. furthermore, by using commonly available fact checking software, we assess the reputation of the pieces of news exchanged. by using a combination of theoretical and practical weapons, we are able to track down the flow of misinformation in a snapshot of the twitter ecosystem. thanks to the presence of verified users, we can also assign a polarization to the network nodes (users) and see the impact of low-quality information producers and spreaders in the twitter ecosystem. propaganda and disinformation have a history as long as mankind, and the phenomenon becomes particularly strong in difficult times, such as wars and natural disasters. the advent of the internet and social media has amplified and made faster the spread of biased and false news, and made targeting specific segments of the population possible [ ] . for this reason the vice-president of the european commission with responsibility for policies on values and transparency, vȇra yourová, announced, beginning of june , a european democracy action plan, expected by the end of , in which web platforms admins will be called for greater accountability and transparency, since 'everything cannot be allowed online' [ ] . manufacturers and spreaders of online disinformation have been particularly active also during the covid- pandemic period (e.g., writing about bill gates role in the pandemics or about masks killing children [ , ] ). this, alongside the real pandemics [ ] , has led to the emergence of a new virtual disease: covid- infodemics. in this paper, we shall consider the situation in italy, one of the most affected countries in europe, where the virus struck in a devastating way between the end of february and the end of april [ ] . in such a sad and uncertain time, propaganda [ ] in italy, since the beginning of the pandemics and at time of writing, almost k persons have contracted the covid- virus: of these, more than k have died. source: http://www.protezionecivile.gov.it/. accessed september , . has worked hard: one of the most followed fake news was published by sputnik italia receiving , likes, shares and comments on the most popular social media. 'the article falsely claimed that poland had not allowed a russian plane with humanitarian aid and a team of doctors headed to italy to fly over its airspace', the ec vice-president yourová said. actually, the studies regarding dis/mis/information diffusion on social media seldom analyse its effective impact. in the exchange of messages on online platforms, a great amount of interactions do not carry any relevant information for the understanding of the phenomenon: as an example, randomly retweeting viral posts does not contribute to insights on the sharing activity of the account. for determining dis/misinformation propagation two main weapons can be used, the analysis of the content (semantic approach) and the analysis of the communities sharing the same piece of information (topological approach). while the content of a message can be analysed on its own, the presence of some troublesome structure in the pattern of news producer and spreaders (i.e., in the topology of contacts) can be detected only trough dedicated instruments. indeed, for real in-depth analyses, the properties of the real system should be compared with a proper null model. recently, entropy-based null models have been successfully employed to filter out random noise from complex networks and focus the attention on non trivial contributions [ , ] . essentially, the method consists in defining a 'network benchmark' that has some of the (topological) properties of the real system, but is completely random for all the rest. then, every observation that does not agree with the model, i.e., cannot be explained by the topological properties of the benchmark, carries non trivial information. notably, being based on the shannon entropy, the benchmark is unbiased by definition. in the present paper, using entropy-based null-models, we analyse a tweet corpus related to the italian debate on covid- during the two months of maximum crisis in italy. after cleaning the system from the random noise, by using the entropy-based null-model as a filter, we have been able to highlight different communities. interestingly enough, these groups, beside including several official accounts of ministries, health institutions, and -online and offline -newspapers and newscasts, encompass four main political groups. while at first sight this may sound surprising -the pandemic debate was more on a scientific than on a political ground, at least in the very first phase of its abrupt diffusion -, it might be due to pre-existing echo chambers [ ] . the four political groups are found to perform completely different activities on the platform, to interact differently from each other, and to post and share reputable and non reputable sources of information with great differences in the number of their occurrences. in particular, the accounts from the right wing community interact, mainly in terms of retweets, with the same accounts who interact with the mainstream media. this is probably due to the strong visibility given by the mainstream media to the leaders of that community. moreover, the right wing community is more numerous and more active, even relatively to the number of accounts involved, than the other communities. interestingly enough, newly formed political parties, as the one of the former italian prime minister matteo renzi, quickly imposed their presence on twitter and on the online political debate, with a strong activity. furthermore, the different political parties use different sources for getting information on the spreading on the pandemics. to detect the impact of dis/misinformation in the debate, we consider the news sources shared among the accounts of the various groups. with a hybrid annotation approach, based on independent fact checking organisations and human annotation, we categorised such sources as reputable and non reputable (in terms of credibility of the published news and the transparency of the sources). notably, we experienced that a group of accounts spread information from non reputable sources with a frequency almost times higher than that of the other political groups. and we are afraid that, due to the extent of the online activity of the members of this community, the spreading of such a volume of non reputable news could deceit public opinion. we collected circa . m tweets in italian language, from february st to april th [ ] . details about the political situation in italy during the period of data collection can be found in the supplementary material, section . : 'evolution of the covid- pandemics in italy'. the data collection was keyword-based, with keywords related the covid- pandemics. twitter's streaming api returns any tweet containing the keyword(s) in the text of the tweet, as well as in its metadata. it is worth noting that it is not always necessary to have each permutation of a specific keyword in the tracking list. for example, the keyword 'covid' will return tweets that contain both 'covid ' and 'covid- '. table lists a subset of the considered keywords and hashtags. there are some hashtags that overlap due to the fact that an included keyword is a sub-string of another one, but we included both for completeness. the left panel of fig. shows the network obtained by following the projection procedure described in section . . the network resulting from the projection procedure will be called, in the rest of the paper, validated network. the term validated should not be confused with the term verified, which instead denotes a twitter user who has passed the formal authentication procedure by the social platform. in order to get the community of verified twitter users, we applied the louvain algorithm [ ] to the data in the validated network. such an algorithm, despite being one of the most popular, is also known to be order dependent [ ] . to get rid of this bias, we apply it iteratively n times (n being the number of the nodes) after reshuffling the order of the nodes. finally, we select the partition with the highest modularity. the network presents a strong community structure, composed by four main subgraphs. when analysing the emerging communities, we find that they correspond to right wing parties and media (in steel blue) center left wing (dark red) stars movement (m s ), in dark orange institutional accounts (in sky blue) details about the political situation in italy during the period of data collection can be found in the supplementary material, section . : 'italian political situation during the covid- pandemics'. this partition in four subgroups, once examined in more details, presents a richer substructure, described in the right panel of fig. . starting from the center-left wing, we can find a darker red community, including various ngos and various left oriented journalists, vips and pundits. a slightly lighter red sub-community turns out to be composed by the main politicians of the italian democratic party (pd), as well as by representatives from the european parliament (italian and others) and some eu commissioners. the violet red group is mostly composed by the representatives of italia viva, a new party founded by the former italian prime minister matteo renzi (december -february ). in golden red we can find the subcommunity of catholic and vatican groups. finally the dark violet red and light tomato subcommunities consist mainly of journalists. in turn, also the orange (m s) community shows a clear partition in substructures. in particular, the dark orange subcommunity contains the accounts of politicians, parliament representatives and ministers of the m s and journalists. in aquamarine, we can find the official accounts of some private and public, national and international, health institutes. finally, in the light slate blue subcommunity we can find various italian ministers as well as the italian police and army forces. similar considerations apply to the steel blue community. in steel blue, the subcommunity of center right and right wing parties (as forza italia, lega and fratelli d'italia). in the following, this subcommunity is going to be called as fi-l-fdi, recalling the initials of the political parties contributing to this group. the sky blue subcommunity includes the national federations of various sports, the official accounts of athletes and sport players (mostly soccer) and their teams. the teal subcommunity contains the main italian news agencies. in this subcommunity there are also the accounts of many universities. the firebrick subcommunity contains accounts related to the as roma football club; analogously in dark red official accounts of ac milan and its players. the slate blue subcommunity is mainly composed by the official accounts of radio and tv programs of mediaset, the main private italian broadcasting company. finally, the sky blue community is mainly composed by italian embassies around the world. for the sake of completeness, a more detailed description of the composition of the subcommunities in the right panel of figure is reported in the supplementary material, section . : 'composition of the subcommunities in the validated network of verified twitter users'. here, we report a series of analyses related to the domain names, hereafter simply called domains, that mostly appear in all the tweets of the validated network of verified users. the domains have been tagged according to their degree of credibility and transparency, as indicated by the independent software toolkit newsguard https://www.newsguardtech.com/. the details of this procedure are reported below. as a first step, we considered the network of verified accounts, whose communities and sub-communities are shown in fig. . on this topology, we labelled all domains that had been shared at least times (between tweets and retweets). table shows the tags associated to the domains. in the rest of the paper, we shall be interested in quantifying reliability of news sources publishing during the period of interest. thus, for our analysis, we will not consider those sources corresponding to social networks, marketplaces, search engines, institutional sites, etc. tags r, ∼ r and nr in table are used only for news sites, be them newspapers, magazines, tv or radio social channels, and they stand for reputable, quasi reputable, not reputable, respectively. label unc is assigned to those domains with less than occurrences in ours tweets and rewteets dataset. in fact, the labeling procedure is a hybrid one. as mentioned above, we relied on newsguard, a plugin resulting from the joint effort of journalists and software table tags used for labeling the domains developers aiming at evaluating news sites according to nine criteria concerning credibility and transparency. for evaluating the credibility level, the metrics consider whether the news source regularly publishes false news, does not distinguish between facts and opinions, does not correct a wrongly reported news. for transparency, instead, the tool takes into account whether owners, founders or authors of the news source are publicly known; and whether advertisements are easily recognizable [ ] . after combining the individual scores obtained out of the nine criteria, the plugin associates to a news source a score from to , where is the minimum score for the source to be considered reliable. when reporting the results, the plugin provides details about the criteria which passed the test and those that did not. in order to have a sort of no-man's land and not to be too abrupt in the transition between reputability and non-reputability, when the score was between and , we considered the source to be quasi reputable, ∼r. it is worth noting that not all the domains in the dataset under investigation were evaluated by newsguard at the time of our analysis. for those not evaluated automatically, the annotation was made by three tech-savvy researchers, who assessed the domains by using the same criteria as newsguard. table gives statistics about number and kind of tweets (tw = pure tweet; rt = retweet), the number of url and distinct url (dist url), the number of domains and users in the validated network of verified users. we clarify what we mean by these terms with an example: a domain for us corresponds to the so-called 'second-level domain' name [ ] , i.e., the name directly to the left of .com, .net, and any other top-level domains. for instance, repubblica.it, corriere.it, nytimes.com are considered domains by us. instead, the url maintains here its standard definition [ ] and an example is http://www.example.com/index.html. table shows the outcome of the domains annotation, according to the scores of newsguard or to those assigned by the three annotators, when scores were no available from newsguard. at a first glance, the majority of the news domains belong to the reputable category. the second highest percentage is the one of the untagged domains -unc. in fact, in our dataset there are many domains that occur only few times once. for example, there are domains that appear in the datasets only once. fig. shows the trend of the number of tweets and retweets, containing urls, posted by the verified users of the validated projection during the period of data [ ] newsguard rating process: https://www.newsguardtech.com/ratings/rating-process-criteria/ [ ] https://en.wikipedia.org/wiki/domain_name [ ] table annotation results over all the domains in the whole dataset -validated network of verified users. in [ ] . going on with the analysis, table shows the percentage of the different types of domains for the communities identified in the left plot of fig. . it is worth observing that the steel blue community (both politicians and media) is the most active one, even if it is not the most represented: the number of users is lower than the one of the center left community (the biggest one, in terms of numbers), but the number of their posts containing a valid url is almost the double of that of the second more active community. interestingly, the activity of the verified users of the steel blue community is more focused on content production of (see the only tweets sub-table) than in sharing (see the only retweets sub-table). in fact, retweets represent almost . % of all posts from the media and the right wing community, while in the case of the center-left community it is . %. this effect is observable even in the average only tweets post per verified user: a right-wing user and a media user have an average of . original posts, against . for center-left-wing users. these numbers are probably due to the presence in the former community of the italian most accessed media. they tend to spread their (original) pieces of news on the twitter platform. interestingly, the presence of urls from a non reputable source in the steel blue community is more than times higher than the second score in the same field in the case of original tweets (only tweets). it is worth noting that, for the case of the dark orange and sky blue communities, which are smaller both in terms of users and number of posts, the presence of non classified sources is quite strong (it represents nearly % of retweeted posts for both the communities), as it is the frequency of posts linking to social network contents. interestingly enough, the verified users of both groups seem to focus slightly more on the same domains: there are, on average, . and . posts for each url domain respectively for the dark orange and sky blue communities, and, on average, . and . posts for the steel blue and the dark red communities. the right plot in fig. report a fine grained division of communities: the four largest communities have been further divided into sub-communities, as mentioned in subsection . . here, we focus on the urls shared in the purely political sub-communities in table . broadly speaking, we examine the contribution of the different political parties, as represented on twitter, to the spread of mis/disinformation and propaganda. table clearly shows how the vast majority of the news coming from sources considered scarce or non reputable are tweeted and retweeted by the steel blue political sub-community (fi-l-fdi). notably, the percentage of non reputable sources shared by the fi-l-fdi accounts is more than times the percentage of their community (the steel blue one) and it is more than times the second community in the nr ratio ranking. for all the political sub-communities the incidence of social network links is much higher than in their original communities. looking at table , even if the number of users in each political sub-community is much smaller, some peculiar behaviours can be still be observed. again, the center-right and right wing parties, while representing the least represented ones in terms of users, are much more active than the other groups: each (verified) user is responsible, on average of almost . messages, while the average is . , . and . for m s, iv and pd, respectively. it is worth noticing that italia viva, while being a recently founded party, is very active; moreover, for them the frequency of quasi reputable sources is quite high, especially in the case of only tweets posts. the impact of uncategorized sources is almost constant for all communities in the retweeting activity, while it is particularly strong for the m s. finally, the posts by the center left communities (i.e., italia viva and the democratic party) tend to have more than one url. specifically, every post containing at least a url, has, on average, . and . urls respectively, against the . of movimento stelle and . for the center-right and right wing parties. to conclude the analysis on the validated network of verified users, we report statistics about the most diffused hashtags in the political sub-communities. fig. focuses on wordclouds, while fig. reports the data under an histograms form. actually, from the various hashtags we can derive important information regarding the communications of the various political discursive communities and their position towards the management of the pandemics. first, it has to be noticed that the m s is the greatest user of hashtags: their two most used hashtags have been used almost twice the most used hashtags used by the pd, for instance. this heavy usage is probably due to the presence in this community of journalists and of the official account of il fatto quotidiano, a newspaper explicitly supporting the m s: indeed, the first two hashtags are "#ilfattoquotidiano" and "#edicola" (kiosk, in italian). it is interesting to see the relative importance of hashtags intended to encourage the population during the lockdown: it is the case of "#celafaremo" (we will make it), "#iorestoacasa" (i am staying home), "#fermiamoloinsieme" (let's stop it together ): "#iorestoacasa" is present in every community, but it ranks th in the m s verified user community, th in the fi-l-fdi community, nd in the italia viva community and th in the pd one. remarkably, "#celafaremo" is present only in the m s group, as "#fermiamoloinsieme" can be found in the top hashtags only in the center-right and right wing cluster. the pd, being present in various european institutions, mentions more european related hashtags ("#europeicontrocovid ", europeans against covid- ), in order to ask for a common reaction of the eu. the center-right and right wing community has other hashtags as "#forzalombardia" (go, lombardy! ), ranking the nd, and "#fermiamoloinsieme", ranking th. what is, nevertheless, astonishing, is the presence among the most used hashtags of all communities of the name of politicians from the same group ('interestingly '#salvini" is the first used hashtag in the center right and right wing community, even if he did not perform any duty in the government), tv programs ("#mattino ", "#lavitaindiretta", "#ctcf", "#dimartedì"), as if the main usage of hashtags is to promote the appearance of politicians in tv programs. finally, the hashtags used by fi-l-fdi are mainly used to criticise the actions of the government, e.g., "#contedimettiti" (conte, resign! ). fig. shows the structure of the directed validated projection of the retweet activity network, as outcome of the procedure recalled in section of the supplementary material. as mentioned in section of the supplementary material, the affiliation of unverified users has been determined using the tags obtained by the validated projected network of the verified users, as immutable label for the label propagation of [ ] . after label propagation, the representation of the political communities in the validated retweet network changes dramatically with respect to the case of the network of verified users: the center-right and right wing community is the most represented community in the whole network, with users (representing . % of all the users in the validated network), followed by italia viva users with accounts ( . % of all the accounts in the validated network). the impact of m s and pd is much more limited, with, respectively, and accounts. it is worth noting that this result is unexpected, due to the recent formation of italia viva. as in our previous study targeting the online propaganda [ ] , we observe that the most effective users in term of hub score [ ] are almost exclusively from the center-right and right wing party: considering the first hubs, only are not from this group. interestingly, out of these are verified users: roberto burioni, one of the most famous italian virologists, ranking nd, agenzia ansa, a popular italian news agency, ranking st, and tgcom , the popular newscast of a private tv channel, ranking rd. the fourth account is an online news website, ranking th: this is a not verified account which belongs to a not political community. remarkably, in the top hubs we find of the top hubs already found when considered the online debate on migrations from northern africa to italy [ ] : in particular, a journalist of a neo-fascist online newspaper (non verified user), an extreme right activist (non verified user) and the leader of fratelli d'italia giorgia meloni (verified user), who ranks rd in the hub score. matteo salvini (verified user), who was the first hub in [ ] , ranks th, surpassed by his party partner claudio borghi, ranking th. the first hub in the present network is an extreme right activist, posting videos against african migrants to italy and accusing them to be responsible of the contagion and of violating lockdown measures. table shows the annotation results of all the domains tweeted and retweeted by users in the directed validated network. the numbers are much higher than those shown in table , but the trend confirms the previous results. the majority of urls traceable to news sources are considered reputable. the number of unclassified domains is higher too. in fact, in this case, the annotation was made considering the domains occurring at least times. table annotation results over all the domains -directed validated network table reports statistics about posts, urls, distinct urls, users and verified users in the directed validated network. noticeably, by comparing these numbers with those of table , reporting statistics about the validated network of verified users, we can see that here the number of retweets is much more higher, and the trend is the opposite: verified users tend to tweet more than retweet ( vs ), while users in the directed validated network, which comprehends also non verified users, have a number of retweets . times higher than the number of their tweets. fig. shows the trend of the number of tweets containing urls over the period of data collection. since we are analysing a bigger network than the one considered in section . , we have numbers that are one order of magnitude greater than those shown in fig. ; the highest peak, after the discovery of the first cases in lombardy, corresponds to more than , posts containing urls, whereas the analogous peak in fig. corresponds to , posts. apart from the order of magnitudes, the two plots feature similar trends: higher traffic before the beginning of the italian lockdown, and a settling down as the quarantine went on [ ] . table shows the core of our analysis, that is, the distribution of reputable and non reputable news sources in the direct validated network, consisting of both verified and non-verified users. again, we focus directly on the political sub-communities identified in the previous subsection. two of the sub-communities are part of the center-left wing community, one is associated to the stars movement, the remaining one represents center-right and right wing communities. in line with previous results on the validated network of verified users, the table clearly shows how the vast majority of the news coming from sources considered scarce or non reputable are tweeted and retweeted by the center-right and right wing communities; % of the domains tagged as nr are shared by them. as shown in table , the activity of fi-l-fdi users is again extremely high: on average there are . retweets per account in this community, against the . of m s, the . of iv and the . of pd. the right wing contribution to the debate is extremely high, even in absolute numbers, due to the the large number of users in this community. it is worth mentioning that the frequency of non reputable sources in this community is really high (at about % of the urls in the only tweets) and comparable with that of the reputable ones (see table , only [ ] the low peaks for february and march are due to an interruption in the data collection, caused by a connection breakdown. table domains annotation per political sub-communities -directed validated network tweets). in the other sub-communities, pd users are more focused on un-categorised sources, while users from both italia viva and movimento stelle are mostly tweeting and retweeting reputable news sources. and users, but also in absolute numbers: out of the over m tweets, more than k tweets refer to a nr url. actually, the political competition still shines through the hashtag usage even for the other communities: it is the case, for instance, of italia viva. in the top hashtags we can find '#salvini', '#lega', but also '#papeete' [ ] , '#salvinisciacallo' (salvini jackal ) and '#salvinimmmerda' (salvini asshole). on the other hand, in italia viva hashtags supporting the population during the lockdown are used: '#iorestoacasa', '#restoacasa' (i am staying home), '#restiamoacasa' (let's stay home). criticisms towards the management of lombardy health system during the pandemics can be deduced from the hashtag '#commissariamtelalombardia' (put lombardy under receivership) and '#fontana' (the lega administrator of the lombardy region). movimento stelle has the name of the main leader of the opposition '#salvini', as first hashtag and supports criticisms to the lombardy administration with the hashtags '#fontanadimettiti' (fontana, resign! ) and '#gallera', the health and welfare minister of the lombardy region, considered the main responsible for the bad management of the pandemics. nevertheless, it is possible to highlight even some hashtags encouraging the population during the lock down, as the above mentioned '#iorestoacasa', '#restoacasa' and '#restiamoacasa'. it is worth mentioning that the government measures, and the corresponding m s campaigns, are accompanied specific hashtags: '#curaitalia' is the name of one of the decree of the prime minister to inject liquidity in the italian economy, '#acquistaitaliano' (buy italian products! ), instead, advertise italian products to support the national economy. as a final task, over the whole set of tweets produced or shared by the users in the directed validated network, we counted the number of times a message containing a url was shared by users belonging to different political communities, although without considering the semantics of the tweets. namely, we ignored whether the urls were shared to support or to oppose the presented arguments. table shows the most tweeted (and retweeted) nr domains shared by the political communities presented in table , the number of occurrences is reported next to each domain. the first nr domains for fi-l-fdi in table are related to the right, extreme right and neo-fascist propaganda, as it is the case of imolaoggi.it, ilprimatonazionale.it and voxnews.info, recognised as disinformation websites by newsguard and by the two main italian debunker websites, bufale.net and butac.it. as shown in the table, some domains, although in different number of occurrences, are present under more than one column, thus shared by users close to different political communities. this could mean, for some subgroups of the community, a retweet with the aim of supporting the opinions expressed in the original tweets. however, since the semantics of the posts in which these domains are present were not investigated, the retweets of the links by more than one political community could be due to contrast, and not to support, the opinions present in the original posts. despite the fact that the results were achieved for a specific country, we believe that the applied methodology is of general interest, being able to show trends and peculiarities whenever information is exchanged on social networks. in particular, when analysing the outcome of our investigation, some features attracted our attention: persistence of clusters wrt different discussion topics: in caldarelli et al. [ ] , we focused on tweets concerned with immigration, an issue that has been central in the italian political debate for years. here, we discovered that the clusters and the echo chambers that have been detected when analysing tweets about immigration are almost the same as those singled out when considering discussions concerned with covid- . this may seem surprising, because a discussion about covid- may not be exclusively political, but also medical, social, economic, etc.. from this we can argue that the clusters are political in nature and, even when the topic of discussion changes, users remain in their cluster on twitter. (indeed, journalists and politicians use twitter for information and political propaganda, respectively). the reasons political polarisation and political vision of the world affect so strongly also the analysis of what should be an objective phenomenon is still an intriguing question. persistence of online behavioral characteristics of clusters: we found that the most active, lively and penetrating online communities in the online debate on covid- are the same found in [ ] , formed in a almost purely political debate such as the one represented by the right of migrants to land on the italian territory. (dis)similarities amongst offline and online behaviours of members and voters of parties: maybe less surprisingly, the political habits is also reflected in the degree of participation to the online discussions. in particular, among the parties in the centre-left-wing side, a small party (italia viva) shows a much more effective social presence than the larger party of the italian centre-left-wing (partito democratico), which has many more active members and more parliamentary representation. more generally, there is a significant difference in social presence among the different political parties, and the amount of activity is not at all proportional to the size of the parties in terms of members and voters. spread of non reputable news sources: in the online debate about covid- , many links to non reputable (defined such by newsguard, a toolkit ranking news website based on criteria of transparency and credibility, led by veteran journalists and news entrepreneurs) news sources are posted and shared. kind and occurrences of the urls vary with respect to the corresponding political community. furthermore, some of the communities are characterised by a small number of verified users that corresponds to a very large number of acolytes which are (on their turn) very active, three times as much as the acolytes of the opposite communities in the partition. in particular, when considering the amount of retweets from poorly reputable news sites, one of the communities is by far (one order of magnitude) much more active than the others. as noted already in our previous publication [ ] , this extra activity could be explained by a more skilled use of the systems of propaganda -in that case a massive use of bot accounts and a targeted activity against migrants (as resulted from the analysis of the hub list). our work could help in steering the online political discussion around covid- towards an investigation on reputable information, while providing a clear indication of the political inclination of those participating in the debates. more generally, we hope that our work will contribute to finding appropriate strategies to fight online misinformation. while not completely unexpected, it is striking to see how political polarisation affects also the covid- debate, giving rise to on-line communities of users that, for number and structure, almost closely correspond to their political affiliations. this section recaps the methodology through which we have obtained the communities of verified users (see section . ). this methodology has been designed in saracco et al. [ ] and applied in the field of social networks for the first time in [ , ] . for the sake of completeness, the supplementary material, section , recaps the methodology through which we have obtained the validated retweet activity network shown in section . . in section of the supplementary material, the detection of the affiliation of unverified users is described. in the supplementary material, the interested reader will also find additional details about ) the definition of the null models (section ); ) a comparison among various label propagation for the political affiliation of unverified users (section ); and ) a brief state of the art on fact checking organizations and literature on false news detection (section ). many results in the analysis of online social networks (osn) shows that users are highly clustered in group of opinions [ , - , , , ] ; indeed those groups have some peculiar behaviours, as the echo chamber effects [ , ] . following the example of references [ , ] , we are making use of this users' clustering in order to detect discursive community, i.e. groups of users interacting among themselves by retweeting on the same (covid-related) subjects. remarkably, our procedure does not follow the analysis of the text shared by the various users, but is simply related on the retweeting activity among users. in the present subsection we will examine how the discursive community of verified twitter users can be extracted. on twitter there are two distinct categories of accounts: verified and unverified users. verified users have a thick close to the screen name: the platform itself, upon request from the user, has a procedure to check the authenticity of the account. verified accounts are owned by politicians, journalists or vips in general, as well as the official accounts of ministers, newspapers, newscasts, companies and so on; for those kind of users, the verification procedure guarantees the identity of their account and reduce the risk of malicious accounts tweeting in their name. non verified accounts are for standard users: in this second case, we cannot trust any information provided by the users. the information carried by verified users has been studied extensively in order to have a sort of anchor for the related discussion [ , , , , ] to detect the political orientation we consider the bipartite network represented by verified (on one layer) and unverified (on the other layer) accounts: a link is connecting the verified user v with the unverified one u if at least one time v was retweeted by u, or viceversa. to extract the similarity of users, we compare the commonalities with a bipartite entropy-based null-model, the bipartite configuration model (bicm [ ] ). the rationale is that two verified users that share many links to same unverified accounts probably have similar visions, as perceived by the audience of unverified accounts. we then apply the method of [ ] , graphically depicted in fig. , in order to get a statistically validated projection of the bipartite network of verified and unverified users. in a nutshell, the idea is to compare the amount of common linkage measured on the real network with the expectations of an entropy-based null model fixing (on average) the degree sequence: if the associated p-value is so low that the overlaps cannot be explained by the model, i.e. such that it is not compatible with the degree sequence expectations, they carry non trivial information and we project the related information on the (monopartite) projection of verified users. the interested reader can find the technical details about this validated projection in [ ] and in the supplementary information. the data that support the findings of this study are available from twitter, but restrictions apply to the availability of these data, which were used under license italian socio-political situation during the period of data collection in the present subsection we present some crucial facts for the understanding of the social context in which our analysis is set. this subsection is divided into two parts: the contagion evolution and the political situation. these two aspects are closely related. a first covid- outbreak was detected in codogno, lodi, lombardy region, on february, th [ ] . in the very next day, two cases were detected in vò, padua, veneto region. on february, th, in order to contain the contagions, the national government decided to put in quarantine municipalities, in the area around lodi and vò, near padua [ ] . nevertheless, the number of contagions raised to , hitting different regions; one of the infected person in vò died, representing the first registered italian covid- victim [ ] . on february, th there were already confirmed cases in italy. the first lockdown should have lasted until the th of march, but due to the still increasing number of contagions in northern italy, the italian prime minister giuseppe conte intended to extend the quarantine zone to almost all the northern italy on sunday, march th [ ] : travel to and from the quarantine zone were limited to case of extreme urgency. a draft of the decree announcing the expansion of the quarantine area appeared on the website of the italian newspaper corriere della sera on the late evening of saturday, th, causing some panic in the interested areas [ ] : around people, living in milan, but coming from southern regions, took trains and planes to reach their place of [ ] prima lodi, ""paziente ", il merito della diagnosi va diviso... per due", th june [ ] italian gazzetta ufficiale, "decreto-legge febbraio , n. ". the date is intended to be the very first day of validity of the decree. [ ] il fatto quotidiano, "coronavirus,è morto il enne ricoverato nel padovano. contagiati in lombardia, un altro in veneto", nd february . [ ] bbc news, "coronavirus: northern italy quarantines million people", th march " [ ] the guardian, "leaked coronavirus plan to quarantine m sparks chaos in italy", th march origins [ ] [ ] . in any case, the new quarantine zone covered the entire lombardy and partially other regions. remarkably, close to bergamo, lombardy region, a new outbreak was discovered and the possibility of defining a new quarantine area on march th was considered: this opportunity was later abandoned, due to the new northern italy quarantine zone of the following days. this delay seems to have caused a strong increase in the number of contagions, making the bergamo area the most affected one, in percentage, of the entire country [ ] ; at time of writing, there are investigations regarding the responsibility of this choice. on march, th, the lockdown was extended to the whole country, resulting in the first country in the world to decide for national quarantine [ ] . travels were restricted to emergency reason or to work; all business activities that were not considered as essentials, as pharmacies and supermarkets, had to be closed. until the st of march lockdown measures became progressively stricter all over the country. starting from the th of april, some retails activities as children clothing shops, reopened. a first fall in the number of deaths was observed on the th of april [ ] . a limited reopening started with the so-called "fase " (phase ) on the th of may [ ] . from the very first days of march, the limited capacity of the intensive care departments to take care of covid-infected patients, took to the necessity of a re-organization of italian hospitals, leading, e.g., to the opening of new intensive care departments [ ] . moreover, new communication forms with the relatives of the patients were proposed, new criteria for the intubating patients were developed, and, in the extreme crisis, in the most infected cases, the emergency management took to give priority to the hospitalisation to patients with a higher probability to recover [ ] . outbreaks were mainly present in hospitals [ ] . unfortunately, healthcare workers were contaminated by the covid [ ] . this contagion resulted in a relative high number of fatalities: by the nd of april, covid deaths were registered among doctors. due to the pressure on the intensive care capacity, even the healthcare personnel was subject to extreme stress, especially in the most affected zones [ ] . on august th, , the leader of lega, the main italian right wing party, announced to negate the support to the government of giuseppe conte, which was formed after a post-election coalition between the renzi formed a new center-left party, italia viva (italy alive, iv), due to some discord with pd; despite the scission, italia viva continued to support the actual government, having some of its representatives among the ministers and undersecretaries, but often marking its distance respect to both pd and m s. due to the great impact that matteo salvini and giorgia meloni -leader of fratelli d'italia, a right wing party-have on social media, they started a massive campaign against the government the day after its inauguration. the regions of lombardy, veneto, piedmont and emilia-romagna experienced the highest number of contagions during the pandemics; among those, the former are administrated by the right and center-right wing parties, the fourth one by the pd. the disagreement in the management of the pandemics between regions and the central government was the occasion to exacerbate the political debate (in italy, regions have a quite wide autonomy for healthcare). the regions administrated by the right wing parties criticised the centrality of the decisions regarding the lock down, while the national government criticises the health management (in lombardy the healthcare system has a peculiar organisation, in which the private sector is supported by public funding) and its non effective measure to reduce the number of contagions. the debate was ridden even at a national level: the opposition criticized the financial origin of the support to the various economic sectors. moreover, the role of the european union in providing funding to recover italian economics after the pandemics was debated. here, we detail the composition of the communities shown in figure of the main text. we remind the reader that, after applying the leuven algorithm to the validated network of verified twitter users, we could observe main communities, that correspond to right wing parties and media (in steel blue) center left wing (dark red) stars movement (m s ), in dark orange institutional accounts (in sky blue) starting from the center-left wing, we can find a darker red community, including various ngos (the italian chapters of unicef, medecins sans frontieres, action aid, emergency, save the children, etc.), various left oriented journalists, vips and pundits [ ] . finally, we can find in this group political movements (' sardine') and politicians on the left of pd (as beppe civati, pietro grasso, ignazio marino) or on the left current of the pd (laura boldrini, michele emiliano, stefano bonaccini). a slightly lighter red sub-community turns out to be composed by the main politicians of the italian democratic party (pd), as well as by representatives from the european parliament (italian and others) and some eu commissioners. the violet red group is mostly composed by the representatives of the newly founded italia viva, by the former italian prime minister matteo renzi (december -february ) and former secretary of pd. in golden red we can find the subcommunity of catholic and vatican groups. finally the dark violet red and light tomato subcommunities are composed mainly by journalists. interestingly enough, the dark violet red contains also accounts related to the city of milan (the major, the municipality, the public services account) and to the spoke person of the chinese minister of foreign affair. in turn, also the orange (m s) community shows a clear partition in substructures. in particular, the dark orange subcommunity contains the accounts of politicians, parliament representatives and ministers of the m s and journalists and the official account of il fatto quotidiano, a newspaper supporting the movement stars. interestingly, since one of the main leaders of the movement, luigi di maio, is also the italian minister of foreign affairs, we can find in this subcommunity also the accounts of several italian embassies around the world, as well as the account of the italian representatives at nato, ocse and oas. in aquamarine, we can find the official accounts of some private and public, national and international, health institutes (as the italian istituto superiore di sanità, literally the italian national institute of health, the world health organization, the fondazione veronesi) the minister of health roberto speranza, and some foreign embassies in italy. finally, in the light slate blue subcommunity we can find various italian ministers as well as the italian police and army forces. similar considerations apply to the steel blue community. in steel blue, the subcommunity of center right and right wing parties (as forza italia, lega and fratelli d'italia). the presidents of the regions of lombardy, veneto and liguria, administrated by center right and right wing parties, can be found here. (in the following this subcommunity is going to be called as fi-l-fdi, recalling the initials of the political parties contributing to this group.) the sky blue subcommunity includes the national federations of various sports, the official accounts of athletes and sport players (mostly soccer) and their teams, as well as sport journals, newscasts and journalists. the teal subcommunity contains the main italian news agencies, some of the main national and local newspapers, [ ] as the cartoonists makkox and vauro, the singers marracash, frankiehinrg, ligabue and emphil volo vocal band, and journalists from repubblica (ezio mauro, carlo verdelli, massimo giannini), from la tv channel (ricardo formigli, diego bianchi). newscasts and their journalists. in this subcommunity there are also the accounts of many universities; interestingly enough, it includes also the all the local public service local newscasts. the firebrick subcommunity contains accounts related to the as roma football club; analogously in dark red official accounts of ac milan and its players. the slate blue subcommunity is mainly composed by the official accounts of radio and tv programs of mediaset, the main private italian broadcasting company, together with singers and musicians. other smaller subcommunities includes other sport federations, and sports pundits. finally, the sky blue community is mainly composed by italian embassies around the world. the navy subpartition contains also the official accounts of the president of the republic, the italian minister of defense and the one of the commissioner for economy at eu and former prime minister, paolo gentiloni. in the study of every phenomenon, it is of utmost importance to distinguish the relevant information from the noise. here, we remind a framework to obtain a validated monopartite retweet network of users: the validation accounts the information carried by not only the activity of the users, but also by the virality of their messages. we represented pictorially the method in fig. . we define a directed bipartite network in which one layer is composed by accounts and the other one by the tweets. an arrow connecting a user u to a tweet t represents the u writing the message t. the arrow in the opposite direction means that the user u is retweeting the message t. to filter out the random noise from this network, we make use of the directed version of the bicm, i.e. the bipartite directed configuration model (bidcm [ ] ). the projection procedure is then, analogous to the one presented in the previous subsection: it is pictorially displayed in the fig. . briefly, consider the couple of users u and u and consider the number of message written by u and shared u . then, calculate which is the distribution of the same measure according with the bidcm: if the related p-value is statistically significant, i.e. if the number of u 's tweets shared by u is much more than expected by the bidcm, we project a (directed) link from u to u . summarising, the comparison of the observation on the real network with the bidcm permits to uncover all contributions that cannot originate from the constraints of the null-model. using the technique described in subsection . of the main text, we are able to assign to almost all verified users a community, based on the perception of the unverified users. due to the fact that the identity of verified users are checked by twitter, we have the possibility of controlling our groups. indeed, as we will show in the following, the network obtained via the bipartite projection provides a reliable description regarding the closeness of opinions and role in the social debate. how can we use this information in order to infer the orientation of non verified users? in the reference [ ] we used the tags obtained for both verified and unverified users in the bipartite network described in subsection . of the main real network c) e) figure schematic representation of the projection procedure for bipartite directed network. a) an example of a real directed bipartite network. for the actual application, the two layers represent twitter accounts (turquoise) and posts (gray). a link from a turquoise node to a gray one represents that the post has been written by the user; a link in the opposite direction represents a retweet by the considered account. b) the bipartite directed configuration model (bidcm) ensemble is defined. the ensemble includes all the link realisations, once the number of nodes per layer has been fixed. c) we focus our attention on nodes i and j and count the number of directed common neighbours (in magenta both the nodes and the links to their common neighbours), i.e., the number of posts written by i and retweeted by j. subsequently, d) we compare this measure on the real network with the one on the ensemble: if this overlap is statistically significant with respect to the bidcm, e) we have a link from i to j in the projected network. text and propagated those labels accross the network. in a recent analysis, we observed that other approaches are more stable [ ] : in the present manuscript we make use of the most stable algorithm. we use the label propagation as proposed in [ ] on the directed validated network. indeed, the validated directed network in the present appendix we remind the main steps for the definition of an entropy based null model; the interested reader can refer to the review [ ] . we start by revising the bipartite configuration model [ ] , that has been used for detecting the network of similarities of verified users. we are then going to examine the extension of this model to bipartite directed networks [ ] . finally, we present the general methodology to project the information contained in a -directed or undirected-bipartite network, as developed in [ ] . let us consider a bipartite network g * bi , in which the two layers are l and Γ. define g bi the ensemble of all possible graphs with the same number of nodes per layer as in g * bi . it is possible to define the entropy related to the ensemble as [ ] : where p (g bi ) is the probability associated to the instance g bi . now we want to obtain the maximum entropy configuration, constraining some relevant topological information regarding the system. for the bipartite representation of verified and unverified user, a crucial ingredient is the degree sequence, since it is a proxy of the number of interactions (i.e. tweets and retweets) with the other class of accounts. thus in the present manuscript we focus on the degree sequence. let us then maximise the entropy ( ), constraining the average over the ensemble of the degree sequence. it can be shown, [ ] , that the probability distribution over the ensemble is where m iα represent the entries of the biadjacency matrix describing the bipartite network under consideration and p iα is the probability of observing a link between the nodes i ∈ l and α ∈ Γ. the probability p iα can be expressed in terms of the lagrangian multipliers x and y for nodes on l and Γ layers, respectively, as in order to obtain the values of x and y that maximize the likelihood to observe the real network, we need to impose the following conditions [ , ]        where the * indicates quantities measured on the real network. actually, the real network is sparse: the bipartite network of verified and unverified users has a connectance ρ . × − . in this case the formula ( ) can be safely approximated with the chung-lu configuration model, i.e. where m is the total number of links in the bipartite network. in the present subsection we will consider the case of the extension of the bicm to direct bipartite networks and highlight the peculiarities of the network under analysis in this representation. the adjancency matrix describing a direct bipartite network of layers l and Γ has a peculiar block structure, once nodes are order by layer membership (here the nodes on l layer first): where the o blocks represent null matrices (indeed they describe links connecting nodes inside the same layer: by construction they are exactly zero) and m and n are non zero blocks, describing links connecting nodes on layer l with those on layer Γ and viceversa. in general m = n, otherwise the network is not distinguishable from an undirected one. we can perform the same machinery of the section above, but for the extension of the degree sequence to a directed degree sequence, i.e. considering the in-and out-degrees for nodes on the layer l, (here m iα and n iα represent respectively the entry of matrices m and n) and for nodes on the layer Γ, the definition of the bipartite directed configuration model (bidcm, [ ] ), i.e. the extension of the bicm above, follows closely the same steps described in the previous subsection. interestingly enough, the probabilities relative to the presence of links from l to Γ are independent on the probabilities relative to the presence of links from Γ to l. if q iα is the probability of observing a link from node i to node α and q iα the probability of observing a link in the opposite direction, we have where x out i and x in i are the lagrangian multipliers relative to the node i ∈ l, respectively for the out-and the in-degrees, and y out α and y in α are the analogous for α ∈ Γ. in the present application we have some simplifications: the bipartite directed network representation describes users (on one layer) writing and retweeting posts (on the other layer). if users are on the layer l and posts on the opposite layer and m iα represents the user i writing the post α, then k in α = ∀α ∈ Γ, since each message cannot have more than an author. notice that, since our constraints are conserved on average, we are considering, in the ensemble of all possible realisations even instances in which k in α > or k in α = , or, otherwise stated, non physical; nevertheless the average is constrained to the right value, i.e. . the fact that k in α is the same for every α allows for a great simplification of the probability per link on m: where n Γ is the total number of nodes on the Γ layer. the simplification in ( ) is extremely helpful in the projected validation of the bipartite directed network [ ] . the information contained in a bipartite -directed or undirected-network, can be projected onto one of the two layers. the rationale is to obtain a monopartite network encoding the non trivial interactions among the two layers of the original bipartite network. the method is pretty general, once we have a null model in which probabilities per link are independent, as it is the case of both bicm and bidcm [ ] . the first step is represented by the definition of a bipartite motif that may capture the non trivial similarity (in the case of an undirected bipartite network) or flux of information (in the case of a directed bipartite network). this quantity can be captured by the number of v −motifs between users i and j [ , ] , or by its direct extension (note that v ij = v ji ). we compare the abundance of these motifs with the null models defined above: all motifs that cannot be explained by the null model, i.e. whose p-value are statistically significance, are validated into the projection on one of the layers [ ] . in order to assess the statistically significance of the observed motifs, we calculate the distribution associated to the various motifs. for instance, the expected value for the number of v-motifs connecting i and j in an undirected bipartite network is where p iα s are the probability of the bicm. analogously, where in the last step we use the simplification of ( ) [ ] . in both the direct and the undirect case, the distribution of the v-motifs or of the directed extensions is poisson binomial one, i.e. a binomial distribution in which each event shows a different probability. in the present case, due to the sparsity of the analysed networks, we can safely approximate the poisson-binomial distribution with a poisson one [ ] . in order to state the statistical significance of the observed value, we calculate the related p-values according to the relative null-models. once we have a p-value for every detected v-motif, the related statistical significance can be established through the false discovery rate (fdr) procedure [ ] . respect to other multiple test hypothesis, fdr controls the number of false positives. in our case, all rejected hypotheses identify the amount of v-motifs that cannot be explained only by the ingredients of the null model and thus carry non trivial information regarding the systems. in this sense, the validated projected network includes a link for every rejected hypothesis, connecting the nodes involved in the related motifs. in the main text, we solved the problem of assigning the orientation to all relevant users in the validated retweet network via a label propagation. the approach is similar, but different to the one proposed in [ ] , the differences being in the starting labels, in the label propagation algorithm and in the network used. in this section we will revise the method employed in the present article, as compared it to the one in [ ] and evaluate the deviations from other approaches. first step of our methodology is to extract the polarisation of verified users from the bipartite network, as described in section . of the main text, in order to use it as seed labels in the label propagation. in reference [ ] , a measure of the "adherence" of the unverified users towards the various communities of verified users was used in order to infer their orientation, following the approach in [ ] , in turn based on the polarisation index defined in [ ] . this approach was extremely performing when practically all unverified users interact at least once with verified one, as in [ ] . while still having good performances in a different dataset as the one studied in [ ] , we observed isolated deviations: it was the case of users with frequent interactions with other unverified accounts of the same (political) orientation, randomly retweeting a different discursive community verified user. in this case, focusing just on the interaction with verified accounts, those nodes were assigned a wrong orientation. the labels for the polarisation of the unverified users defined [ ] were subsequently used as seed labels in the label propagation. due to the possibility described above of assigning wrongly labels to unverified accounts, in the present paper, we consider only the tags of verified users, since they pass a strict validation procedure and are more stable. in order to compare the results obtained with the various approaches, we calculated the variation of information (vi, [ ] ). v i considers exactly the different in information contents captured by two different partition, as consider by the shannon entropy. results are reported in the matrix in figure for the th of february (results are similar for other days). even when using the weighted retweet network as "exact" result, the partition found by the label propagation of our approach has a little loss of information, comparable with the one of using an unweighted approach. indeed, the results found by the various community detection algorithms show little agreement with the label propagation ones. nevertheless, we still prefer the label propagation procedure, since the validated projection on the layer of verified users is theoretically sound and has a non trivial interpretation. the main result of this work quantifies the level of diffusion on twitter of news published by sources considered scarcely reputable. academy, governments, and news agencies are working hard to classify information sources according to criteria of credibility and transparency of published news. this is the case, for example, of newsguard, which we used for the tagging of the most frequent domains in the direct validated network obtained according to the methodology presented in the previous sections. as introduced in subsection . of the main text, the newsguard browser extension and mobile app [ ] offers a reliability result for the most popular newspapers in the world, summarizing with a numerical score the level of credibility and journalistic transparency of the newspaper. with the same philosophy, but oriented towards us politics, the fact-checking site politifact.com reports with a 'truth meter' the degree of truthfulness of original claims made by politicians, candidates, their staffs, and, more, in general, protagonists of us politics. one of the eldest fact-checking websites dates back to : snopes.com, in addition to political figures, is a fact-checker for hoaxes and urban legends. generally speaking, a fact-checking site has behind it a multitude of editors and journalists who, with a great deal of energy, manually check the reliability of a news, or of the publisher of that news, by evaluating criteria such as, e.g., the tendency to correct errors, the nature of the newspaper's finances, and if there is a clear differentiation between opinions and facts. thus, it is worth noting that recent attempts tried to automatically find articles worthy of being fact-checked. for example, work in [ ] uses a supervised classifier, based on an ensemble of neural networks and support vector machines, to figure out which politicians' claims need to be debunked, and which have already been debunked. despite the tremendous effort of stakeholders to keep the fact-checking sites up to date and functioning, disinformation resists debunking due to a combination of factors. there are psychological aspects, like the quest for belonging to a community and getting reassuring answers, the adherence to one's viewpoint, a native reluctance to change opinion [ , ] , the formation of echo chambers [ ] , where people polarize their opinions as they are insulated from contrary perspectives: these are key factors for people to contribute to the success of disinformation spreading [ , ] . moreover, researchers demonstrate how the spreading of false news is strategically supported by the massive and organized use of trolls and bots [ ] . despite the need to educate the user to a conscious fruition of online information through means also different from those represented by technological solutions, there are a series of promising works that exploit classifiers based on machine learning or on deep learning to tag a news as credible or not. one interesting approach is based on the analysis of spreading patterns on social platforms. monti et al. recently provide a deep learning framework for detection of fake news cascades [ ] . a ground truth is acquired by following the example by vosoughi et al. [ ] collecting twitter cascades of verified false and true rumors. employing a novel deep learning paradigm for graph-based structures, cascades [ ] https://www.newsguardtech.com/ are classified based on user profile, user activity, network and spreading, and content. the main result of the work is that 'a few hours of propagation are sufficient to distinguish false news from true news with high accuracy'. this result has been confirmed by other studies too. work in [ ] , by zhao et al. examine diffusion cascades on weibo and twitter: focusing on topological properties, such as the number of hops from the source and the heterogeneity of the network, the authors demonstrate that networks in which fake news are diffused feature characteristics really different from those diffusing genuine information. diffusion networks investigation appear to be a definitive path to follow for fake news detection. this is also confirmed by pierri et al. [ ] : also here, the goal is to classifying news articles pertaining to bad and genuine information' by solely inspecting their diffusion mechanisms on twitter'. even in this case, results are impressive: a simple logistic regression model is able to correctly classify news articles with a high accuracy (auroc up to %). the political blogosphere and the u.s. election: divided they blog ) coronavirus: 'deadly masks' claims debunked coronavirus: bill gates 'microchip' conspiracy theory and other vaccine claims fact-checked extracting significant signal of news consumption from social networks: the case of twitter in italian political elections fast unfolding of communities in large networks influence of fake news in twitter during the us presidential election how does junk news spread so quickly across social media? algorithms, advertising and exposure in public life the role of bot squads in the political propaganda on twitter tracking social media discourse about the covid- pandemic: development of a public coronavirus twitter data set the statistical physics of real-world networks political polarization on twitter predicting the political alignment of twitter users partisan asymmetries in online political activity echo chambers: emotional contagion and group polarization on facebook mapping social dynamics on facebook: the brexit debate ) tackling covid- disinformation -getting the facts right ) speech of vice president věra jourová on countering disinformation amid covid- -from pandemic to infodemic filter bubbles, echo chambers, and online news consumption community detection in graphs finding users we trust: scaling up verified twitter users using their communication patterns opinion dynamics on interacting networks: media competition and social influence near linear time algorithm to detect community structures in large-scale networks randomizing bipartite networks: the case of the world trade web inferring monopartite projections of bipartite networks: an entropy-based approach maximum-entropy networks. pattern detection, network reconstruction and graph combinatorics journalists on twitter: self-branding, audiences, and involvement of bots emotional dynamics in the age of misinformation debunking in a world of tribes coronavirus, a milano la fuga dalla "zona rossa": folla alla stazione di porta garibaldi coronavirus, l'illusione della grande fuga da milano. ecco i veri numeri degli spostamenti verso sud coronavirus: italian army called in as crematorium struggles to cope with deaths coronavirus: italy extends emergency measures nationwide italy sees first fall of active coronavirus cases: live updates coronavirus in italia, verso primo ok spostamenti dal / , non tra regioni italy's health care system groans under coronavirus -a warning to the world negli ospedali siamo come in guerra. a tutti dico: state a casa coronavirus: ordini degli infermieri, mila i contagiati automatic fact-checking using context and discourse information extracting significant signal of news consumption from social networks: the case of twitter in italian political elections controlling the false discovery rate: a practical and powerful approach to multiple testing users polarization on facebook and youtube fast unfolding of communities in large networks the role of bot squads in the political propaganda on twitter the psychology behind fake news the statistical physics of real-world networks fake news: incorrect, but hard to correct. the role of cognitive ability on the impact of false information on social impressions echo chambers: emotional contagion and group polarization on facebook graph theory (graduate texts in mathematics) resolution limit in community detection maximum likelihood: extracting unbiased information from complex networks. phys rev e -stat nonlinear on computing the distribution function for the poisson binomial distribution reconstructing mesoscale network structures the contagion of ideas: inferring the political orientations of twitter accounts from their connections comparing clusterings by the variation of information fake news detection on social media using geometric deep learning at the epicenter of the covid- pandemic and humanitarian crises in italy: changing perspectives on preparation and mitigation. catal non-issue content near linear time algorithm to detect community structures in large-scale networks randomizing bipartite networks: the case of the world trade web inferring monopartite projections of bipartite networks: an entropy-based approach the spread of low-credibility content by social bots analytical maximum-likelihood method to detect patterns in real networks a question of belonging: race, social fit, and achievement cognitive and social consequences of the need for cognitive closure fake news propagate differently from real news even at early stages of spreading analysis of online misinformation during the peak of the covid- pandemics in italy supplementary material guido caldarelli , , * † , rocco de nicola † , marinella petrocchi † , manuel pratelli † and fabio saracco † there is another difference in the label propagation used here against the one in [ ] : in the present paper we used the label propagation of [ ] , while the one in [ ] was quite home-made. as in reference [ ] , the seed labels of [ ] are fixed, i.e. are not allowed to change [ ] . the main difference is that, in case of a draw, among the labels of the first neighbours, in [ ] a tie is removed randomly, while in the algorithm of [ ] the label is not assigned and goes into a new run, with the newly assigned labels. moreover, the updated of labels in [ ] is asynchronous, while it is synchronous in [ ] . we opted for the one in [ ] for being actually a standard in the label propagation algorithms, being stable, more studied, and faster [ ] . finally, differently from the procedure in [ ] , we applied the label propagation not to the entire (undirected version of the) retweet network, but on the (undirected version of the) validated one. (the intent of choosing the undirected version is that in both case in which a generic account is significantly retweeting or being retweeted by another one, they do probably share some vision of the phenomena under analysis, thus we are not interested in the direction of the links, in this situation.) the rationale in using the validated network is to reduce the calculation time (due to the dimensions of the dataset), while obtaining an accurate result. while the previous differences from the procedure of [ ] are dictated by conservativeness (the choice of the seed labels) or by the adherence to a standard (the choice of [ ] ), this last one may be debatable: why choosing the validated network should return "better" results than the ones calculated on the entire retweet network? we consider the case of a single day (in order to reduce the calculation time) and studied different approaches: a louvain community detection [ ] on the undirected version of the validated network of retweets; a louvain community detection on the undirected version of the unweighted retweet network; a louvain community detection on the undirected version of the weighted retweet network, in which the weights are the number of retweets from user to user; a label propagation a la raghavan et al. [ ] on the directed validated network of retweets; a label propagation a la raghavan et al. on the (unweighted) retweet network; a label propagation a la raghavan et al. on the weighted retweet network, the weights being the number of retweets from user to user. actually, due to the order dependence of louvain [ ] , we run several times the louvain algorithm after reshuffling the order of the nodes, taking the partition in communities that maximise the modularity. similarly, the label propagation of [ ] has a certain level of randomness: we run it several times and choose the most frequent label assignment for every node. key: cord- -e zojanb authors: lieberoth, andreas; pedersen, mads kock; marin, andreea catalina; planke, tilo; sherson, jacob friis title: getting humans to do quantum optimization - user acquisition, engagement and early results from the citizen cyberscience game quantum moves date: - - journal: nan doi: . /hc.v i . sha: doc_id: cord_uid: e zojanb the game quantum moves was designed to pit human players against computer algorithms, combining their solutions into hybrid optimization to control a scalable quantum computer. in this midstream report, we open our design process and describe the series of constitutive building stages going into a quantum physics citizen science game. we present our approach from designing a core gameplay around quantum simulations, to putting extra game elements in place in order to frame, structure, and motivate players' difficult path from curious visitors to competent science contributors. the player base is extremely diverse - for instance, two top players are a year old female accountant and a male taxi driver. among statistical predictors for retention and in-game high scores, the data from our first year suggest that people recruited based on real-world physics interest and via real-world events, but only with an intermediate science education, are more likely to become engaged and skilled contributors. interestingly, female players tended to perform better than male players, even though men played more games per day. to understand this relationship, we explore the profiles of our top players in more depth. we discuss in-world and in-game performance factors departing in psychological theories of intrinsic and extrinsic motivation, and the implications for using real live humans to do hybrid optimization via initially simple, but ultimately very cognitively complex games. when online participants are used as workhorses for difficult problems such as eterna's needle-inhaystack-like rna model selection task (lee et al, ) or eyewire's formidable challenge of mapping neural connectivity in the mouse retina (marx, ) , conclusive results may lie months and years into the future. in the rapidly growing field of human computation the design of new initiatives is often based on intuition rather than proven design hypotheses. the citizen cyberscience community is, however, a remarkably open scientific group, which allows us to capitalize on a unique and generous culture to learn from each other at each step of the journey -not just in the end, when all is securely tested and published. the citizen cyberscience game quantum moves was designed to help build a quantum computer -a computer more powerful than any other in the world based on moving atoms around under the principles governing quantum physics. this delicate process involves a constant risk of losing hold on the volatile atoms, if they are not moved precisely and quickly. the simulations in which our algorithms tried to optimize this process bear a remarkable similarity to side scrolling casual games, and so the notion of human quantum optimization was hatched: what if real humans would do things differently than the logical step-by-step nature of the algorithms? would explicit understandings of the counterintuitive quantum problems help people solve our problem more intelligently? would human physical and cognitive fallibility add an interesting random factor? and even if people in general could not beat the ai, would play trajectories resulting from certain lucky punches or persistent quantum heroes be enough to help the ais learn in new ways? this is the premise of human-computer hybrid optimization: helping ais learn through real people's blooming buzzing mess of solutions, when problems can be represented as engaging game levels. quantum moves places itself with eyewire, galaxy zoo, foldit and eterna in a small category of large-scale resource demanding online citizen cyberscience endeavors where real problem solving is taking place beyond pure data gathering. with this midstream paper, we want to share our experiences from this first year of beta design and player recruitment, and make our reflections and learning curves available. we believe that our findings about the engagement process -although preliminarycontain a series of conclusions pertinent to the design-and engagement processes hidden behind human computation. we first give a brief introduction to the physics behind the game, and the method of turning its core tenets into a playable game. this paper does not focus on the actual game results and the reader with no interest in physics can safely skip the first part. we then turn to the main focus of the paper, the description of the design considerations from the first year, especially relating to user acquisition and the structural gameplay surrounding the game's core loop. finally, we present data about participation for the beta year, noting how different properties like recruitment source and physics interest predict tenacity and performance in the game. we conclude by looking closer at our most dedicated "heroes" and discussing future perspectives for human computation and quantum moves. quantum mechanics originated in the beginning of the th century when the physicists of the time realized that the known laws of physics were not capable of describing the structure of atoms. experiments made by ernest rutherford showed that the atom had to consist of a positively charged nucleus orbited by negatively charged particles called electrons (longair, ) . in , niels bohr showed that only certain orbits were allowed, and that the electron could only jump from one obit to another by absorbing or emitting a quantum of light with the correct amount of energy. even more remarkably the electron was allowed to be in two different orbits simultaneously until a measurement was performed to determine in which of the orbits it was. from this, quantum mechanics evolved through the work of heisenberg (born, ), de broglie ( , schrödinger ( ) and many others. they showed that atoms should be described by a wave function: a distribution describing the probabilities of measuring the atom in every point in space. computer technology has also progressed rapidly over the past decades. moore's law states that available computer power doubles every months (moore, ) , due to the ability to fabricate ever-smaller transistors. however, the miniaturization has a lower limit due to quantum effects. this led to the proposal of the quantum computer (feynman, ) , in which the computer bits which are traditionally only allowed to be either or are replaced by quantum bits (qubits) that are allowed to be and simultaneously. a quantum computer holds the potential for huge calculation power since qubits would be capable of representing numbers (larger than the total number of atoms in the universe), in contrast to the normal computer where bits can represent just number. quantum computers have already been created in a very small system of qubits (vandersypen, ) , but they have yet to be implemented in a scalable system capable of outperforming traditional computers. many proposals for such scalable architectures exist in various systems and in one of them, atoms are contained in an egg-tray like trap made of interfering light beams (weitenberg, a , weitenberg, b . the quantum computation is then performed by concatenating a sequence of individual qubit flip operations (jørgensen, ) and two-qubit operations consisting of picking up an atom in one well with a focused tweezer of light and transporting it into contact with another atom somewhere else in the computer (weitenberg, b) . this is non-trivial because any fast movement of the tweezer causes the (probability distribution of the) atom to slosh around and this sloshing will result in an error in the calculation because a sloshing atom contains kinetic energy, which means that it is not in the ground state. the challenge of quantum moves consists in finding algorithms describing how the laser in the physical machine should be controlled to move atoms quickly from one location to another without introducing sloshing at the end. in quantum moves, players help build a powerful quantum computer by finding ways of moving a simulated atom from one location in the game interface to another, without inducing sloshing. the movement of the atom is guided and constrained by a so-called potential landscape spanning the screen (black line in figure ). the probability distribution of the atom (green in figure ) resembles a liquid, but will slosh and distribute itself in smaller waves at the slightest wrong movement according to the rules of quantum physics. we call the collective shape of the atom at any given instance of time its state. each level describes a unique problem represented by the potential landscape's line combined with pre-specified beginning-and target states. success is measured by the degree of overlap between the final state of the atom and that of the target area. a game always consists of controlling the simulated tweezer with your computer mouse for a given amount of time (by moving the bottom of an indentation in the potential landscape). dragging the mouse horizontally changes the position, whereas a vertical move increases or decreases the depth of the trapping indentation (physically realized by turning the power of the laser up or down). a game consists of one complete trajectory of the mouse through a particular level. this can be characterized as the game's "core loop" (fields, ) or (fittingly) "game atom" (elias, garfield, & gutschera, ) , as it recurs in every game level, variably modified with new obstacles, potential landscapes, goal-states and bonus-points to challenge players or simulate particular physics problems. the levels are organized according to an overall structural gameplay, or metagame, where they get unlocked in stepwise fashion and players receive rewards through different symbolic feedback such as highscores, - "stars" according to the degree of success, and acquisition of skill-and achievement badges. the entire solution for each game played is stored on our servers for potential future use in our laboratory. computer algorithms used to optimize this type of problems typically exist in two variants. in local algorithms, an initial trajectory is slightly perturbed in a stepwise fashion, and if a change proves beneficial, further in that same direction. this step -although it sounds simple -can be very sophisticated in state-of-the-art algorithms. unfortunately, this deterministic update often causes it to get stuck in what is called a local maximum. in contrast, algorithms with a random component such as genetic optimization algorithms can jump erratically from one solution to the next. this randomness ensures that they will never get stuck and eventually find the best solution -called the global maximum. the disadvantage with this type of optimization is the fact that the steps are random and therefore only beneficial in a small fraction of the times. this makes algorithms with random components exceedingly slow, often in fact so sluggish that in practice they will never converge to the optimal solution. the aim of quantum moves is to combine the best of both worlds in our gamified human quantum optimization: optimization that is rational most of the time, but sometimes makes seemingly random errors or leaps of intuition to rapidly find the sought after solutions. one example of a gameplay trajectory which requires a certain critical breakthrough (iacovides, aczel, scanlon, & woods, , is the level bring home water fast (see figure ). the aim is to fetch an atom from the well on the far right of the screen, and ferry it carefully back to your beginning position on the left. here, a good solution consists in utilizing the principle of quantum tunneling by bringing the two wells close together without merging them into one. the atom will the tunnel through the classically forbidden intermediate region, and appear in the well created by the moveable tweezer. specifically, we compare the players to the computer algorithms in two ways. first, for problems which can eventually be solved by the computer, we compare the score after each optimization to the equivalent high score of players after having played an equal amount of times. as long as the player score is higher than the computer score, the human result really represents the fastest way of getting results at that particular junction. of course, since the specialty of the computer is to make minute adjustments and improvements, if the particular problem is of a nature that can be solved by the computers, eventually it will overtake the players. in such cases it is an extremely interesting question, to which extent player results can be used as a starting point for a computer optimization that will yield faster convergence rate than the computer optimization alone. finally, for some of the problems posed the computer fails to find good solutions and it is of course extremely interesting to which extent players can find these solutions. although the analysis of the more than , unique play trajectories generated so far is not complete yet, a pattern seems to emerge that the human optimization is indeed superior to the computer in many problem spaces investigated in all of the three areas discussed above. even more surprising, it seems that the fraction of the players actually outperforming the computer is quite large. whether the players succeed by fluke, by building in-game skills, or possibly through a simple theoretical understanding of the quantum physics principles represented in the game is still an open question, which will be the topic of future research. a central challenge to achieve the optimization described above is that quantum moves needs to recruit engageable players and hone their in-game skills over time. player acquisition is a central part of online game launch strategies. the industry metric user acquisition cost (uac) is an aggregate of advertisement cost, development cost, back-end expenditures and similar prices used to describe how much money a social game developer will spend, on average, to get a new user (fields, ) . developers thus expect to spend quite a bit of time and money to get click-through and installations of their games. this is both an ongoing process of marketing and design, and a focused enterprise to build critical mass at launch. however, while the commercial industry needs to balance this effort and expenditure against each player's average lifetime value (ltv) in dollars and cents, as well as their ability to make the game a social place and recruit friends to join them (lifetime network value, lnv), citizen science games need to consider the time and relative price of a different kind of payoffnamely tangible science contribution from each player and his network. we can label the average direct player contribution user science value (usv). player contributions on our sister-game galaxy zoo have been aggregated into four main clusters, or participation profiles. these reveal that some contribute a lot early, usually on their first login, never to return, while others establish a stable pattern of play over a prolonged period of time (brasiliero, ; ponciano, brasiliero, simposon & smith, ) . each galaxy pattern recognized is a worthwhile addition of data. in more complex games like eyewire (robinson, personal communication, / . ) and quantum moves, however, players need to build a modicum of skill before they can reliably contribute to the core scientific challenges (barring flukes arising from the random factor that is human cognitive and behavioral processing, i.e. bob, ) . we could call the average point at which players start doing anything directly useful, the game's user contribution threshold (uct). we have come to call the small percentage of players who tenaciously acquire the necessary skills and persist far beyond the uct "heroes". our notion of heroes mirrors that of whales from commercial games that rely on free-to-play strategies where only a small group of players are ever really monetized through in-game purchases, premium memberships and the like. while citizen science games with low cognitive complexity can benefit from every minute any player spends, games with high cognitive complexity like quantum moves and eyewire only usually gain direct value from a player if he/she becomes a hero. players who just visit out of curiosity and then drop out can be called flâneurs, while average participants explore the structural gameplay deeper, and might contribute by fluke. to that calculation we must then add a player's network value, which means that it seems like a viable strategy to make the game fun and attractive for everyone, even if only a few manage to contribute to the core scientific challenge. generating sustained engagement at less complex levels of game participation may also be a way to gradually build the loyalty and skills needed for a flâneur to transition into a hero. a real concern, which our empirical work aims to address, is, however, if the average hero's preferences and play trajectories diverge substantially from average players. from a psychological standpoint, it is important to mention the difference between intrinsic and extrinsic motivation, as they apply to participant retention in citizen science projects. there are competing schools of thought on the matter (deci, ; malone, ) , but they agree on core tenets: in extrinsic motivation, action is based on outside rewards like money or socially valued praise, or avoidance of unpleasant states like scolding. this entails a tendency to act half-heartedly and cease the behavior when the outside factors dry out (deci, koestner, & ryan, ) . in a state of intrinsic motivation, on the other hand, the user is driven by desires to participate, explore, learn and master the activity in itself. extrinsic design-logics are commonly found in vulgar points-badges-leaderboards (pbl) gamification, where mainstream ideas (e.g. bowser, preece, & hansen, ; marczewski, ) resemble diluted versions of behaviorist token economy (e.g. kazdin, ; skinner, ) and th century utility-based economics brought into question by behavioral economists like kahneman ( ) . extrinsically informed motivational design places great importance on the magnitude and temporal distribution of rewards, as well as their psychological framing. intrinsic motivation theories, on the other hand, prescribe design structures that support self-determination via a sense of competence, autonomy and social relatedness (deci & ryan, ; rigby & ryan, ) , or challenge, fantasy and curiosity (malone, ) plus interpersonal factors like recognition, competition and cooperation (malone & lepper, ) . for an overview, see below figure . fogg and eckles' ( ) "behavior chain for online participation" can be used to conceptualize phases of user involvement. the discovery phase "learning about the service" and "visiting the site. once they have arrived on the site, users have plenty of opportunities to explore the information available and get the chance "to be educated and influenced". for example, the quantum moves site has videos and photos integrated. also, the website hosts a forum, where visitors can read discussions of registered players and get the chance to be exposed to the game. in the superficial involvement stage, users are influenced to "decide to try" and "get started" with the game. at this stage the structural gameplay will fluently guide them through the tutorial levels and hopefully prompt them to create a profile and validate an email activation link, so they can save their progress. it is only in the final phase, called true commitment, users generate large added value. in quantum moves this translates to users making a valuable contribution to either the science, game development, or through their lifetime network value. such examples range from creation of posts and video materials to creating levels, but contributors' core commitment to quantum moves is measured in play counts and scores. this -phase process mirrors the conceptual difference between interest (berlyne, (berlyne, , , motivation (grant & dweck, ; jensen & buckley, ; prestopnik & crowston, ; raddick et al., ; ryan, rigby, & przybylski, ) and sustained engagement (kular, gatenby, rees, soane, & truss, ; rigby & ryan, ; skinner, seddon, & postlethwaite, ) . interest can be understood as the immediate psychological allure of something encountered in the world, usually as a property of perceptual processing (berlyne, (berlyne, , ) exacting a motivational pull of curiosity to investigate. engagement (kahn, ) includes both motivation, behavioral change, and persistence in an activity. to acquire engaged players across the beta period, we used four separate recruitment strategies. the early parts of the game (see below) were tested in high-school science classrooms and our own university lectures, where large numbers of students were forced to sign up. we also had the opportunity to speak at several high-profile events, such as a couple of public lectures with over people combined, which we used for an a/b-test of certain game elements (see below). the project has additionally garnered a good deal of attention in traditional media and online, which has generated substantial influxes in identifiable jolts. finally, a fourth group of ongoing clickthrough can be attributed to community efforts and general buzz arising online and in-world, making their origin essentially unknown. since many players cease participating quite quickly (a common pattern in games and citizen science projects alike, but especially prevalent for students forced to participate), we also announced several "featured challenges", where existing players were prompted to come back and knuckle down on particular levels. we awarded extra in-game badges for participation, and offered combinations of books, logo mugs, t-shirts, and even a lab visit with all expenses paid (see below) to top contributors. we can thus envision a two-dimensional space for motivational devices, with one axis constituted by in-game (points, progress, good core gameplay, community, etc.) versus in-world (our talks and teaching efforts, prizes, recommendations from friends) (lieberoth & roepstorff, ; stevens, satwicz, & mccarthy, ) situation, and the other ranging from intrinsic (science participation, challenging gameplay, fun, fascination with physics, etc.) to extrinsic (physical prizes, mandatory high school participation) motivational flavors. game designs based on intrinsic motivation are thought to satisfy the criteria through gameplay processes alone (the core loop and the structural gameplay surrounding it), while our recruitment strategies in educational settings and prizes in featured challenges can be said to be classically extrinsic. extrinsic reward has been found to exact a detrimental effect on intrinsic motivations (deci et al., ) , but it is worth noticing that points, leaderboard placements and even physical prizes for top-performance may act as intrinsically motivating feedback, social devices and signs of mastery. but people may also already have interests and preferences in the real world, that makes citizen science participation intrinsically motivating, as for instance seen in player data from galaxy zoo where the most prevalent motivators were not related to the site's superficial gamification devices, but rather a fascination with outer space or a sense of participating in something greater (raddick et al., ) . as such, creating engagement is usually a question of balancing design strategies rather than relying on a single approach, such as vulgar pbl-gamification. recruitment activities in-world and online, an engaging in-game core loop, a structural gameplay to frame, structure and motivate the player's continual progression through the levels, as well as an active community where participants get a sense of continually contributing to science, are all central components of the strategy laid out to hopefully realizing the scientific goals of quantum moves. in the remaining sections, we describe the game design process as it has unfolded in a sometimes stumbling but always-creative fashion with its many different ideas and theoretical rationales. we then report statistics about users, recruitment and retention, and the attributes that characterize our small crop of heroes so far. we developed quantum moves as the first game under the scienceathome.org umbrella, one of the first citizen science projects in the quantum physics segment. the initiative is developed within our cross disciplinary centre for community driven research (coder) , where we aim to bridge theoretical and experimental research with online community efforts. the early development of quantum moves, previously known as the quantum computer game, started in december with the first coding iterations deployed in matlab -an algebra program widely used in physics. the decision was based on the program's plotting flexibility and the programming experience available in the student pool. an early version of the game was ready in february (see figure a . in the appendix) and subsequently tested in several danish high schools. this served as an overall proof of concept, yet we noticed that several challenges hindered the game experience, mainly due to matlab's limited portability and graphic support. over the following months, we improved various aspects of the initial prototype. however, a test session with volunteers in the summer of made it clear that, if we wanted the game to have a large public appeal, we had to abandon matlab and rewrite the code in a flexible, end-user accessible programming language. the best solution at that time was java, as it addressed most of our concerns of end-user accessibility, robustness, flexibility in programming and variety in graphics. a preliminary java edition came out in early october (see figure a. ), and a testing version made available to a small group outside the coder team in dec (see figure a. , a. , a. ). the first, fully-fledged beta version was publicly launched and advertised on the th of june (see figure a . ). the game was structured around levels: tutorial, arcade (a series of games where players could practice) scientific (where we included the games we considered most relevant for our lab research) and user space (a sandbox environment, where players could design their own games or try those created by other users in the community). when the tutorial games were successfully completed, a user was allowed to roam around and try any of the other games found in the arcade and scientific levels. the launch of this beta version was covered in several national media outlets, e.g. national geographic and videnskab.dk. the media attention played a definite role in making the game known to an audience beyond the usual high school students. yet, we experienced several glitches with the university server which slowed down our initial success. a brief follow-up survey sent personally to the top players after launch pointed out that the technical issues were doubled by frustrations with the abrupt increase in difficulty, starting with the tutorial. also, looking at the data gathered in the days after the release, we noticed, that once players got past the tutorial, they predominately chose to play in the scientific level with insufficient time spent in the arcade level intended to help them acquire necessary core skills. this became a premise for our discussions and, as the team expanded to include a business school graduate and a psychology researcher, we focused on reevaluating the game design. in august , the current beta version was introduced ( figure a. ) . from this point on, we refer to that version in this paper. early, we were confronted with two pressing aspects that needed to be addressed: one was redesigning the tutorial to create a lower entry barrier, while also ensuring an effective learning curve. the other was to create a game structure that would enable players to hone the skills needed to perform with high effect in the scientific levels, hopefully turning some into heroes. we started by operationalizing the main physics concepts applied in the game and their equivalent in-game operations into a set of core skills that players would need to acquire: deceleration of the atom speed, tunneling the atom into the target state and stabilization of the atom state. these became our guiding references for the structural redesign of both the tutorial and the advanced levels. firstly, we reorganized the tutorial into a set of games, which gradually introduced the main physics elements and core game loop: the atom as a ball, continuing with the atom as a wave, and finishing with the insertion of a static obstacle. to ensure that players had appropriate visual scaffolding and understood the goals of each challenge, we added video animations preceding each level, presenting one possible trajectory along with written hints. the successful completion of the last tutorial level allows players to access the main menu, with the option of playing in separate skill labs: cool, tunneling and control. we decided to create predetermined paths, presented in a tree structure (see figure a. ) . by doing so, we aimed to push players to go through the skill training levels before the scientific levels where the main contributions to our current citizen science problems are made. since individual levels differ in difficulty and require refinement of particular in-game operations, each skill lab was divided into a bachelor and master section. the user has to successfully complete - levels to acquire skill badges on their profile, which would unlock specific new levels. once the bachelor level was completed, players would gain partial access to both the master section and the scientific labs: qcomp and beat ai. to achieve full scientific access, the master's would have to be completed at some point. beyond this, the structural gameplay tree consisted of a few more nodes: created as a more theoretically grounded, finetune contains a set of tools that makes it easy for a player to manually change points on the path of his previously played levels, adjust the timing, stretch or shrink the total time or smooth the path. since the tool functions are not automated, it is ultimately up to the user to decide which parameters are to be changed in order to optimize the path. the user-defined space is open to all players, regardless of skills. it includes construction yard, a sand box type of environment where users can build their own levels with goals and obstacles and playground, a space where users can play each other's' creations. compared to the predefined games, these are not connected to the main tree structure, and can be played immediately after completing the tutorial. therefore, we decided not to set any skills requirements for creating games. furthermore, since the user games created are primarily supposed to be used as community building drivers, they are neither checked for resolvability, nor included in any of our data analysis processes. based on the number of plays, the user-created games are ranked on the "the most popular games in the community" leaderboard. the last and newest branch in early combined predefined user-created games in challenges with the intention to enhance competition and the community feeling. for instance the newest game type is called "quantum quests". here we catered to achievement oriented players (heeter, lee, medler, & magerko, ; sherry & lucas, ) , who crave more traditionally "gamy" elements such as finite super mario-like lives. building on the iron man type of game principles, this game consisted of a series of selected games, where a player has to progress as much as possible through the level with only a limited number of attempts, called "lives". it is currently being used as a base for creating more immersive d levels using the unity game development platform. to create a sense of progress in quantum moves with minimal effort, we implemented the simple points-badges-leaderboards (pbl) mechanic recurrent in casual games like candy crush saga and shallow marketing gamification. in quantum moves, the score is calculated on the basis of various parameters, such as time penalty, overlap with target state, points collected while avoiding the obstacles, and is presented to players in the end screen triggered by the finalization of a game sequence. based on the obtained score, a player would move up or down on the leaderboard, presented in the lower right corner of the game window (e.g. fig. a ). to mark a player's progress, his score on the leaderboard is shown in relation to similar scores in his range. this is similar to the techniques found on social games, where the scores of a player are presented in relation to others in his/her social network. we made use of two feedback techniques found in gamified applications (groh, ) and mobile games to introduce visual cues that could enhance a player' sense of progression (deterding, ) . the first is a bar presented at the top of the game interface, which provides players with real time feedback on their performance. the bar changes its color from red to yellow and finally to green to suggest the effectiveness of a player's trajectory. mirroring the wellknown concept of stars found in games like candy crush saga, we also introduced "atoms" as an alternative three-tier grading system reflecting acceptable, good and excellent performance thresholds for each levels, indirectly challenging players to revisit levels and perfecting their collection. the potential downside is that once a game is completed with three stars, players might lose motivation to further optimize his/her score. the last technique to indicate progress is by acknowledging specific achievements in the form of badges on the player profile (antin and churchill, , denny, ) . we designed a series of possible badges for two types of achievement: performance and engagement. depending on performance at various junctions of the game, and cumulative time spent logging and interacting with the game universe, a player could receive a recognition for reaching a specific milestone, "bachelor degree" or a particular threshold of play counts like "quantum frenzy ". to test the effectiveness of the pbl framework and constrained paths of play on player engagement, we also set up a randomized a/b-test surrounding a major event in august , where around people would attend talks about quantum moves and the quantum computer. the literature on gamification is rife with pundits clamoring about the efficacy of badges in motivational design, but there is not much quantitative data to back them up, so we jumped on the opportunity to test whether giving badges to players at various junctions would be a significant motivator for repeated visits and engaged gameplay. at the time, we needed to figure out how to best guide players on their path to the scientific levels, without creating barriers to progress and the feeling of competence. we had conflicting hypotheses about how to build this into structural gameplay: on one hand, the central psychological usability (norman, ) and choice architecture (thaler & sunstein, ) principle of guiding constraints recommends that a system should have a certain amount of openness, but the basic interface for core operations should be limited to the most preferable options. locking levels until the necessary skills are accumulated fits this view. on the other hand, we want players to follow their curiosity, maximizing autonomy, motivation, and access to the science levels. in this view, the structural gameplay would not need stringent locks based on skills, but just the tree-like lab structure laid out as an open map, with friendly hints about where to go, if a level proves too difficult. to this end, an "a/b test" was conceived as a x factorial design, randomly assigning the expected new players to one of four conditions: locked levels or open levels with or without badges. unfortunately, the test was crippled by both time constraints and technical difficulties. our programmers were working overnight to implement the system to automate a/b-test cell assignment, but also had a lot of more pressing design issues to address before launch. in the end, only a fraction of the badges we had designed were implemented, and players only got weak cues when achieving them. since we did not have time to test it, the skill system unlocking levels seemed also somewhat counterintuitive. some old server issues also came back to haunt us on the day, so many invitees were probably unable to log on to the game after creating their account. with these limitations, the usable data were only a fraction of the people assigned to each condition (down to n= player in the "open levels with no badges" cell). the results were statistically insignificant, and will not be reported below. we are awaiting new opportunities to gather this kind of data, which we believe is central to both our own work and adding a much-needed controlled evidence-base to the academic discussion of gamification in general. as these game design choices were being implemented, we collected data on user acquisition, play trajectories, and the few dedicated players we call heroes from beta launch in to writing this paper in april . the results are reported in the next section. between the th of november and th of april , there were approximately users visiting the www.scienceathome.org website, . % of these being new unique visitors. average visits last minutes, where users visit an average of pages on the website. ip addresses came from at least countries (see figure ); yet, the country tally could be even higher, as we could not place a country of origin to approximately users. at this point, most registrations originate from denmark, followed then by germany and the united states. in the following analysis we present results derived from data collected over a period of months, from the beta version launched on th of june until th of april . the sample relevant for the demographic and human computation data is equal to a cohort of people. % of all players actually finish the tutorial (see figure ) , which is an encouraging number. figure illustrates early drop-off, as people move from superficial involvement to commitment via the mandatory tutorial levels. these data were collected from all users registered between th of july to th of march , which is equivalent to a total sample of players using the newest version of the tutorial. as it can be noticed on figure , the drop-off mainly happens between play during the first six levels (m= %, % ci[ %, %]). however, the th tutorial level display a large discrepancy between the numbers of tried versus completed levels. a one-way anova was performed, to see if the th tutorial level (m= %) belongs to a different normal distribution of completion ratios than the first six levels (f( , ) = . , p < . **). this indicates that the th level has a significantly lower completion ratio than the first six tutorial levels, suggesting that the th tutorial level, which is the first level that includes the zones of death, which the atom is not allowed to touch, and which constitutes a significant barrier to an otherwise fluid flow of progress, most likely proving too difficult or disrupting competence motivation. an alternative explanation could, however, be that noncommitted visitors simply realize that they have now seen all the game has to offer, and either stop because their flanêurs' curiosity has been satisfied, or because they are not sufficiently attracted by the gameplay to engage further. as detailed at the top of section , during the beta period we used four strategies to recruit players. the classifications online/media and voluntary by talks were based on all registration in a ± days' time span before and after a registered major online or in-world event. the days before the event was included to account for the users that wanted to checkout the game in anticipation of the event. to avoid overlap with online/media, we attributed the voluntary by talks tag only to the users registered in the same country as where the event took place. the classification of forced by talks was given by a tag added at the point of registration. the results of these recruitment efforts are summarized in figure , which displays increases in unique users attributable to one of the main groups. a large percentage of our registered beta users originate in the online/media group. the steep initial rise of this curve can be attributed to national media coverage of the game launch in june , while the stepwise increases beginning around day is due to ongoing publicity. from mid-september , we increased our communication efforts by setting up a content strategy for the blog and forum on scienceathome.org, where players ask and answer questions related to the game, report bugs or request new game features. these efforts were complemented by a social media presence, with the launch of a facebook page and twitter account, and an increased focus on producing video content for vimeo. also, more attention was directed towards a proactive public relations approach, in order to ensure the recognition of the game on established human computation websites and blogs like scistarter and citizen science centre. following this, we noticed an increase in the number of other sources mentioning the game or giving positive reviews referring back to our website. to analyze if any of our data could indicate a predictor for the number of play counts (see figure ), a one-way anova test for differences in the number of play counts generated per active day for each player was performed between the four registration origin groups (see table ). this showed a significant between-groups difference (f( , )= . , p=< . **). in order to discover which groups are different from each other a tukey-kramer post-hoc comparison was performed and it shows that both forced by talks and voluntary by talks differed significantly from online/media (p= . * and p< . ** respectively), unknown (p= . ** and p= . ** respectively) and each other (p< . **). online/media and unknown did not differ significantly from each other (p= . ) suggesting that these two large player groups were qualitatively similar, revealing a pattern where people who actually went out the door in-world on their own accord are highly motivated to play, compared to those who found their way to us online, and especially the poor souls forced by teachers who only played a little. a similar procedure was conducted for physics interest, years of physics education beyond th grade, and finally between male and female (all summarized in table ). the one-way anova shows significant differences for both physics interest (f( , )= . , p< . **), gender (f( , )= . , p< . **) and physics education (f( , )= . , p< . **). tukey-kramer post-hoc comparisons show that high interest in physics leads to significantly more play that low and middle interest (both p< . **), while the latter two do not differ significantly (p= . ). for physics education beyond the th grade the post-hoc comparisons show that the very large group with - years of schooling played much more than those with - years (p= . **) and - years (p< . **) of additional schooling behind them, which did not differ significantly from each other (p= . ). the game framing of our citizen science project thus seems to speak to people with a notable casual interest in physics, but not a too professional background. we were also interested in who performed well after having completed the tutorial levels, so we calculated the average score on the tutorial levels (the most stable figure, since players may have chosen many different paths after the th tutorial level) calculated for the four registration groups, interest, physics education beyond th grade, and gender (see table a one-way anova reveals that the registration groups differed significantly on average score reached on the tutorial levels (f( , )= . , p< . **). the tukey-kramer post-hoc comparisons show that forced by talks differed significantly from online/media (p< . **) and unknown (p= . **), but not from voluntary by talks (p= . ). voluntary by talks surprising differ from online/media (p= . *) but not unknown (p= . ), which do not differ from each other (p= . ). interestingly, the forced by talks group performed best, perhaps because they received better in-world instructions than those tackling the tutorial only with in-game cues as scaffolding. but it is also worth noticing, that these players forced by extrinsic means were much less likely to complete the tutorial levels the first place, so the population registered here is likely to reflect only the most tenacious and intrinsically motivated members of the cohort, who might also have engaged fully in the game on their own time. no significant effect on tutorial scores was found for interest (f( , )= . , p= . ), nor education f( , )= . , p= . ), but there was a significant difference between the genders, with females significantly outperforming their male competition (f( , ) = . , p= . **). to assess how quantum moves retained players recruited by the four different means, we calculated the number of active days, defined as a day with at least one play count, for each user. a score of can thus mean someone who plays intensely for days on end never to return, or someone who returns a day per months for half a year. a look at the drop-off curves in figure indicates that % of the users drop out within days and never return to the game. a one-way anova comparing the drop-off rates did not show a significant difference between the four registration groups. however, players that stayed for more than active days were all recruited at public events, and subsequently signed up voluntarily. these players are the ones we refer to in our paper as "heroes". sadly, demographic data is available only for of them. even though the sample is small, we present a comparative analysis of the heroes versus more casual players in quantum moves. in figure , it can be noticed the results derived are based on a sample of "heroes" and casual users. a considerable part of our work has converged on heroes: that is, identifying players who significantly contribute in the science levels based on existing personal attributes like talent, interests and cognitive surplus, and in turn designing structural gameplay to help more casual players cross the uct through combination of motivation and fluid skill acquisition. a wilcoxon rank sum test for equal medians was conducted to compare heroes to more casual participants interest in physics (reported on a likert scale ranging - ) (hero mean= . sd= . ; casual mean= . sd= . ) and years of education related to physics (e.g. high school science) (hero mean= years, sd= . ; casual player mean= . years sd= . ), revealing no significant predictive power for who becomes a hero on either of these. firmly believing in the power of mixed methods triangulation, however, we also have some interesting qualitative data to illuminate this. out of the identified heroes, we had direct contact with three, namely sus, shb and meilby. for privacy reasons, we will refer to them by their quantum moves user name. due to their early contributions, sus and shb were invited as special guests to an offline community event and lab visit on the th of november , where they gave brief interviews to help us understand player motivations and promote quantum moves. the semi-structured chat was based on questions related predominantly to their motivation to play quantum moves and their preferred game elements. contact to meilby was established via email. his response was among those we received as a reply to the email sent out after the beta version launch in june . in this email, the questions were formulated around the strategies top players used to obtain above average scores in particular games, as well as to give their impressions with regard to the game structure and the newly launched features. sus (f, years old) was working as an accountant, based in copenhagen, denmark, when we met her. she signed up as a player in quantum moves after a public talk on the topic of information security and quantum computers, held by jacob sherson at the experimentarium on - august . since that date, sus had active days and play counts. this is noteworthy as, according to her own testimony, sus had never played a computer game until she signed up for quantum moves. when we asked sus what motivates her to keep playing the game, she mentioned that what motivated her to play quantum moves was the "knowledge that the results will be used for real, scientific purposes" (recorded interview with sus, th of nov ). sus' case helps us to understand, that quantum moves does not necessarily acquire its heroes from a traditional gamer demographic (although women + are the quickest expanding gamer group, software entertainment association (esa), ), and should take this into account for both recruitment, structural and core game design, and framing purposes. the drive to help science by playing a computer game was also what shb (f, ) mentioned as main motivation to sign up for quantum moves. shb is a danish high school student and member of the unf, a youth organization which aims to attract and develop young talents into natural sciences. just as sus, shb signed up as a player after the public talk at the experimentarium on - august . she has had active days and play counts. when asked what motivates her to keep playing quantum moves, she answered "it is fun to play, among others because i know (that by doing so) i help science" (danish-english translation of an excerpt from recorded interview with shb, th of nov. ). shb's case suggests that in-world membership in scientific communities of interest may provide a fertile ground for future recruitment. meilby joined the quantum moves community early and was part of testing several previous editions of the game before the beta version launch in august . in the selected time span, he had a total number of active days and a cumulative level plays. meilby drives a taxi and has no physics education beyond the standard level obtained by attending high school. his case becomes interesting because he reported being able to replicate, through several iterations, the correct sequences of actions needed to perform an operation known in physics as "quantum tunneling" based on theoretical reasoning rather than trial and error (for full description, see appendix . ). based on his formal education, he would not have the sufficient knowledge to analytically translate the physics concepts used in the game. meilby thus exemplifies the contrast between several hypotheses we have about how a hero manages to move to the top of the leaderboards: through explicit theoretical knowledge about quantum principles (as expected by the physicists in our group, who advocate focusing on explicit semantic knowledge in player skill acquisition), through implicit familiarity and complex processes of predictive coding in the core handeye coordination used to play (friston, ; scott & dienes, ) , or some combination of these higher-and lower-order cognitive processes, unique to the human mind/brain/body complex, allowing human players to explore, tinker and strategize their way through difficult and counterintuitive physics problems via more or less conscious trajectories that no computer algorithm would ever fathom. in this paper, we have tried to open up our design process for quantum moves, and reported significant findings about factors predicting play activity and performance, ranging from gender to recruitment. notably, we have devoted time to understand the most tenacious and talented kind of players, who we label heroes. quantum moves gets most of its participants online, but our beta data reveal a much more interesting pattern: people who were very interested in physics but had little formal background in the field, and who actually at some point went out the door to meet us, played much more tenaciously than those who clicked their way to www.scienceathome.org online -or were forced by extrinsic means. thus, genuinely interested amateurs reached in-world seem to be the most fertile ground for recruiting intrinsically motivated citizen cyberscientists to games like quantum moves. having attended talks with real physicists also seems to help people achieve better scores once they start playing. for citizen science, the relationship between in-world and in-game motivation is nontrivial. as for the actual human computation, the first year of quantum moves has been devoted to "calibrating our users" to solve various challenges (first scientific results will be published elsewhere shortly). we purposely abstained from helping the users beyond teaching them the basic game mechanics, even when we knew of efficient solutions to a problem. this choice provided an unbiased ensemble of solutions, which has two advantages: first it contains solutions we would never have thought of, and second, it provides a measure of what the users can realistically contribute. we use this to compare user learning curves with computer algorithms as they both found solutions for the same level. the score of a user after each try is compared with the score the computer would have after the same number of iterations. this has shown that users are fast at obtaining a good score, but they will never obtain the perfect score, whereas the computer algorithms need a lot more iterations to obtain a good score, but they are capable of finding the solution with a level of precision which yields the perfect score that we need. however, for a level like bring home water fast the users try out solution-patterns which the computer algorithms would never find by themselves, and they discover solutions which are faster than the computer algorithms' more linear computational trajectory. getting to know our player-base and especially the heroes reveals that citizen cyberscience games like quantum moves cater to a nontraditional demographic compared to both science and gaming. for instance, female players significantly outscore males after having played the tutorial, even though males play more per day. our data comparing those forced to play via schoolwork to players who came to us via in-world events or online channels also suggest that users are much more inclined to play if the action springs from curiosity or if players are intrinsically motivated to contribute to science, supporting earlier findings by raddick and colleagues ( ) as well as the central tenets of self-determination theory. what is less clear, however, is what game elements help engage and retain players once they have been recruited, and begin to move from superficial to true commitment. while quantum moves is unique compared to other citizen science games in having an engaging and challenging core game loop that by itself lives up to prominent definitions of (casual) games (juul, ; salen & zimmerman, ) , we also expect that a well-designed structural gameplay (sometimes called metagame) is central to frame, structure and motivate the play experience, both helping and goading players to move from level to level along appropriate learning curves balanced between boredom and anxiety. in the end, we accept that the high level of cognitive complexity in quantum moves compared to citizen science games that rely on players as simple pattern-hunters or "mules" carrying data gathering devices into the real world , means that we will lose players at a higher rate due to the difficulty, but we believe that it is a viable strategy to work towards a game that is fun and learnable for everyone: on one hand to capitalize on each player's lifetime network value, and on the other in the hope of helping players hone the three skills central to succeed at the scientific levels, moving them beyond the user contribution threshold. we were vague about the difference between a hero and a player who crosses the uct. this is because we view hero-status as a significant individual property combining sustained engagement and a high level of skill, while crossing the uct can happen to anyone by a fluke or one good gameplay. in this sense, we conceive of our heroes less as loincloth clad conan-types who individually topple empires and more alike the ww trench-heroes who made up the bulge with grit, skill, and intrinsic determination, and together make a difference worth reverence. however, even though many players may be truly engaged and come back for several days, we still need deeper analysis of the scientific results to tell how many actually manage to cross the uct by accident and/or by attaining hero status -that is, who actually manages to contribute to the quantum computer, as our beta population has mainly guided us in designing an effective game interface, and test simple hybrid optimization against only specific algorithms. the limits to the current study are manifold, as we have compiled this contribution as an open midstream report rather than waiting for the best possible data years into the future. centrally, the n of active players, and especially heroes, is limited, and mainly stem from a danish context. as such, new patterns are likely to emerge when we establish an even wider international profile. also, the many anovas run are almost certain to show some kind of statistically significant results, even though they may not be practically significant. mancovas would have been preferable, but since the population for each test differed slightly, this is the right kind of statistical test available at this point. further analysis and data gathering is clearly needed. in this instance, our a/b test was unable to show a statistically significant role for challenging constraints, openness to explore or the infamous. the lesson learned is, that large-scale tests should only be conducted after a satisfactory alpha-test of the gameplay in each cell, and ideally supported by using qualitative measures of player engagement and interaction-patterns to allow triangulation of causal mechanisms shaping gameplay trajectories. the games industry might not have time for these kind of detailed studies with most casual games being developed by small expert teams in less than months, but as scientists, we should. so much the wiser, we still hope to one day lead the citizen cyberscience games community in methodologically sound testing of design hypotheses about recruitment, engagement and different kinds of motivation. the relationship between curiosity on one hand, and intrinsic versus extrinsic motivation on the other, goes some way toward explaining the steep and permanent drop-off in player engagement as they make their way through the tutorial, and is especially seen after quantum moves teaching and recruitment events, where people may have been forced to register and thus driven primarily by external factors. a steep drop-off curve is in no way unusual for online games (fields, ) , so a full % tutorial playthrough suggests that the levels encountered early in the behavior chain actually keep players interested for a good while. future design perspectives include building better just-intime feedback into the core game loop, making it more 'juicy' and helping players understand exactly what is going well or wrong at any moment. after a year of stalling due to programming pool limitations, we also finally have the capacity to integrate the game with social media such as facebook, which will add an entirely new social dimension to both gameplay and recruitment. finally, we have yet to narratively structure the structural gameplay and implement truly juicy graphics. as our community manager noted, we do not need to become the next candy crush saga in terms of juiciness and engagement -just the candy crush saga of physics. we are now in a position to deploy more advanced psychological methods in mapping player trajectories and trying to predict who might be born heroes and who can become ones, which should significantly inform our future design of the structural gameplay. our centre's unique crossdisciplinary nature allows us access to eye-tracking and other tools that will soon help us test interaction design, and conflicting hypotheses about the usefulness of explicit understanding of the physics issues (as expressed my meilby) versus implicit learning of the hand-eye coordination and establishment of predictive coding (bubic, von cramon, & schubotz, ; friston, ) to account for the counterintuitive movement of the 'liquid' wave. as another exiting step we will soon be able to report the first comprehensive psychometrics battery collated especially for citizen science gaming, in collaboration with other major players. the revolutionary bit will, however, not be the our data collection, but the analysis of complex relationships between hosts of in-game and in-world variables via our next game: a title both for teaching statistics and engaging our community in analyzing data with advanced techniques like path analysis, that require intelligent human causal propositions and critical evaluations of other user's solutions -something no algorithm can ever do. human optimization is superior to the computer in many instances, but we have yet to understand the exact cognitive nature of these solutions as they apply to the varying problem spaces in quantum moves. this, along with more work on motivation, will be the subject of extensive future research. contrary to commercial casual games, we cannot change or abandon our core game loop, which represents the real quantum optimization processes needed to operate a scalable quantum computer. we can, however, change the look, feel, usability, and structural gameplay surrounding our game to frame, structure and motivate more engaging play trajectories, and ensure a sufficient learning curve to help people cross our high science contribution threshold. we have opened up that messy and experimental design process in this inaugural issue, hoping that others will be able to learn from our experiences. we encourage other researchers to publish future human computation pieces along the same lines. the present study would not have been possible without all the citizen scientists and casual players on http://www.scienceathome.org/. this is your research as much as ours. "merge merge was a pain (…) and since i did not really figured out how to get a single atom to the excited state i gave up kind of fast. i got the best score i think it was , , and i was a little mad about being so bad at the challenge because the score went to as max, and . is just not even near it. well i had some time off, and i tried again a, or some days after. with no luck. then i said to myself, since i was not doing any progress, and did not know how is should get the first atom to the excited state, i started doing the excited challenge again, and found the only way i could do it. since the background was with lines i started using them, i started raising the curve with the first atom to just above the second atom (according to the lines) and figured i was letting then merge too fast because the first atom was what i would say too excited, and not going to relax. then i had to be patient, when they were merging, and with some patience letting it flow in itself when they got near each other. the first times it did not work out, instead of the st atom splitting into it spilt into , and i knew i had it i just needed more tries. then it happened after some cursing it split into two, and i got a score points, which i was chocked about, i was like wtf, i just did it, and i am getting nothing! i think it is too hard to do, and the fact that you cant do it fast is giving me stress, the fact i have to wait all seconds every time to make it happen is a reason why it's not so funny to do. (i guess i played too much, and hate the fact i can't get it done right, pisses me off j) then i started finetune my score point, i knew already it was done right because it spilt into and the nd atom was really relaxed. it took me some time to be familiar with the finetune part, but after spending some time in there, i finally got something, by zoom and keep moving small parts and watching the score i got to points. i used the smooth tool early on, because it was flickering the line in the mittle graph, and after i had dragged the line around and around, it started to flicker (jeg mener linjen bliver zig zagget), i believe it is not good for the score, but i'm not able to remove it, i tried the locking tool to lock the line around the part and the use the smooth tool on the not locked line but it fucked it all up. i've added some pictures to show you what happened." badges in social media: a social psychological perspective a theory of human curiosity novelty, complexity, and hedonic value chaos, cognition and the disordered brain gamifying citizen science : lessons and future directions volunteers engagement profiles in volunteer thinking systems prediction, cognition and the brain the psychology of self-determination a meta-analytic review of experiments examining the effects of extrinsic rewards on intrinsic motivation self-determination theory: a macrotheory of human motivation, development, and health the effect of virtual achievements on student engagement gamification: designing for motivation a universe of consciousness characteristics of games mobile & social game design: monetization methods and mechanics mobile persuasion: perspectives on the future of behavior change predictive coding, precision and synchrony clarifying achievement goals and their impact gamification: state of the art definition and utilization beyond player types: gaming achievement goal what can breakdowns and breakthroughs tell us about learning and involvement experienced during game-play? making sense of game-play : how can we examine learning and involvement ? why people attend science festivals: interests, motivations and self-reported benefits of public engagement with research half-real: video games between real rules and fictional worlds psychological conditions of personal engagement and disengagement at work thinking, fast and slow the token economy: a decade later employee engagement : a literature review rna design rules from a massive open laboratory deep and shallow gamification: the thin evidence for effects in marketing and forgotten powers of good games engaging consumers through branded entertainment and convergent media mixed methods in games research -playing on strengths and countering weaknesses toward a theory of intrinsically motivating instruction* making learning fun -a taxonomy of intrinsic motivations aptitude, learning, and instruction gamification -a simple introduction. tips, advice and thoughts on gamification the design of everyday things volunteers' engagement in human computation astronomy projects gaming for (citizen) science: exploring motivation and data quality in the context of crowdsourced science through the design and evaluation of a social-computational. e-science workshops (esciencew) galaxy zoo: motivations of citizen scientists glued to games -how video games draw us in and hold us spellbound the motivational pull of video games: a self-determination theory approach rules of play: game design fundamentals. leonardo the conscious, the unconscious, and familiarity video game uses and gratifications as predictors of use and game preference playin computer fames: motives, responses and consequences the free and happy student creating a model to examine motivation for sustained engagement in online communities. education and information technologies essential facts about the computer and video game industry in-game , in-room , in-world : reconnecting video game play to the rest of kids ' lives nudge: improving decisions about health, wealth, and happiness ).scienceathome, home key: cord- -jl lj yh authors: amini, hessam; kosseim, leila title: towards explainability in using deep learning for the detection of anorexia in social media date: - - journal: natural language processing and information systems doi: . / - - - - _ sha: doc_id: cord_uid: jl lj yh explainability of deep learning models has become increasingly important as neural-based approaches are now prevalent in natural language processing. explainability is particularly important when dealing with a sensitive domain application such as clinical psychology. this paper focuses on the quantitative assessment of user-level attention mechanism in the task of detecting signs of anorexia in social media users from their posts. the assessment is done through monitoring the performance measures of a neural classifier, with and without user-level attention, when only a limited number of highly-weighted posts are provided. results show that the weights assigned by the user-level attention strongly correlate with the amount of information that posts provide in showing if their author is at risk of anorexia or not, and hence can be used to explain the decision of the neural classifier. social media is a rich source of information for the assessment of mental health, as its users often feel they can express their thoughts and emotions more freely, and describe their everyday lives [ ] . this is why the use of natural language processing (nlp) techniques to extract information about the mental health of social media users has become an important research question in the last few years [ , ] . one of the main challenges of developing tools for the automatic detection of mental health issues from social media is providing justification for the decisions. mental health issues are still often stigmatised and labelling a user as a victim of a mental health illness without a proper justification is not socially responsible. as a result, to be applicable in a real-life setting, automatic systems should not only be accurate, but their decisions need to be explained. in the past decade, deep learning algorithms have become the state of the art in many nlp applications. by automatically learning the representation of useful linguistic features for the tasks they are performing, deep learning approaches have lead to impressive improvements in most nlp tasks [ , ] . this also applies to the domain of nlp for mental health assessment, where recent deep learning models have led to state-of-the-art results in the field [ , , ] . however, despite achieving high performance, one of the most important drawbacks of these models is their black box nature, where the reasoning behind their decision is difficult to interpret and explain to the end users. this constitutes a serious setback to their adoption by health professionals [ ] . the focus of this paper is to assess the usefulness of user-level attention mechanism [ ] as a means to help explain neural classifiers in mental health. although the experiments were performed on the detection of anorexia in social media, the methdology is not domain-dependent, hence can be applied to other tasks involved in the detection of mental health issues of social media users, based on their online posts. the paper is organized as follows: sect. explains the two levels where the attention mechanism can be used (i.e. intra-document and inter-document), and describes the related work in validating explainability using attention mechanism. section explains our experiments to validate the interpretability of userlevel attention, whose results are then presented in sect. . section provides additional observations in terms of how the attention mechanism has worked. finally, sect. concludes the paper and provides future directions for the current work. attention mechanism [ ] has become an essential part of many deep learning architectures used in nlp, as it allows the model to learn which segments of text should be focused on to arrive at a more accurate decision. in text classification applications, such as the detection of mental health issues, attention mechanisms can be applied both at the intra and the inter-document levels [ ] . at the intra-document level, the attention mechanism learns to find informative segments of each document, and assigns higher weights to these segments when creating a representation of the whole document. the success of the intradocument attention mechanism has made it an essential part of transformers [ ] , which are now the building block of several powerful nlp models, such as bert [ ] . on the other hand, the inter-document attention mechanism tries to identify entire documents that are more informative from a collection, and assign higher weights to these when computing the representation of the whole collection. the inter-document attention mechanism is generally used when the classification pertains to the entire collection, as opposed to individual documents. previous work in nlp for clinical psychology has typically used this type of attention mechanism to create a representation of social media users: a collection of online posts from each user is fed to the model and the inter-document attention (also referred to as user-level attention) creates a representation of the user through a weighted average of the representations of their online posts, with the most informative posts are assigned higher weights. while mohammadi et al. [ ] and matero et al. [ ] have used inter-document attention for the task of suicide risk assessment, maupome et al. [ ] and mohammadi et al. [ ] have utilized it for the detection of depression and anorexia, respectively. to explicitly provide explainability in deep nlp models, several methods have been proposed. wang et al. [ ] , lee et al. [ ] , lin et al. [ ] , and ghaeini et al. [ ] have used attention visualization based on attention heat maps. these heat maps graphically show which parts of the texts have been given higher or lower attention weights. in nlp for clinical psychology, the data is usually sensitive and standard attention visualization are not ideal. hence, other methods have been developed to show the validity of the attention explainability. for example, ive et al. [ ] provided paraphrased sentences from the dataset, alongside their assigned attention weights. jain and wallace [ ] and serrano and smith [ ] proposed quantitative approaches to validate the explainability of intra-document attention mechanism. while jain and wallace's method was focused on randomly shuffling, and also generating adversarial attention weights [ ] , serrano and smith analyzed attention explainability by zeroing out the attention weights. in this paper, we propose a quantitative approach, specifically focused on the user-level (inter-document) attention mechanism in a binary classification task of detection of a specific mental health issue, anorexia. our approach is based on monitoring the performance measures of a neural classifier, with and without user-level attention, when only a limited number of highly-weighted posts are provided. the neural classifier used is the cnn-elmo model from mohammadi et al. [ ] . this model was chosen because it achieved comparable results to the best performing model at the recent erisk shared task [ , ] , and is based on an end-to-end architecture, which makes the reasoning behind its decision more easily explainable. the trained model was first run on the testing data, and for each user, her/his posts were ranked from the highest attention weights to the lowest. we then ran the following two experiments: ) we tested the model by feeding it only the n top-weighted posts by each user. we gradually increased values of n from to , and monitored the performance of the system as n changes. the purpose of this experiment was to compare the performance of the model when all the posts are available, with when only the top-ranking posts (based on the attention weights) are available to the system. ) we replaced the user-level attention with a simple average pooling and re-ran experiment . the aim of this experiment was to evaluate the contribution of the user-level attention by ablating it from the model. the architecture of the cnn-elmo model is shown in fig. . for each user, her/his posts are first tokenized and then fed to an embedder, to extract a dense representation for each token. for the embedder, the original d version of elmo [ ] , pretrained on the billion word language model benchmark [ ] was used. for each post, unigram and bigram convolution filters were applied on the token embeddings. the output of the convolution filters were then fed to a concatenated rectified linear unit (crelu), and max pooling was applied to the output of the crelus. the output of the two max pooling layers were then concatenated and used as the representation for each post. the final user representation of a user was calculated by averaging (experiment ) or weighted averaging (experiment ) the representations of the available posts by that user. in order to calculate the weights, a single fully connected layer was applied to the representation of each post, mapping the post representation to a scalar. a softmax activation function was then applied over the scalars, which resulted in the weights corresponding to each post. the last layer of the model was comprised of a single fully-connected layer, mapping the user representation to a vector of size two. finally, by applying a softmax activation function over this vector, the probability for each user belonging to the anorexic/non-anorexic class was calculated. the dataset used is from the first sub-task of the erisk shared task [ ] , whose focus is the early risk detection of anorexia. the dataset consists of a collection of posts from the reddit social media, and is annotated at the userlevel, indicating whether a user is anorexic or not. for this work, we have focused on the detection of anorexia, without considering the earliness of the detection as the shared task does. table shows statistics of the training, validation, and testing datasets. as the table shows, the data contains posts from users for training, users for validation, and users for testing, with an average of to posts per user. as indicated in losada et al. [ ] , the dataset was collected following the extraction and annotation method, proposed by coppersmith et al. [ ] . the anorexic users were self-identified by explicitly stating being diagnosed with anorexia on reddit, while the non-anorexic users were randomly crawled from the same social media. from the set of anorexic users, these specific posts which discussed being diagnosed with anorexia were removed from the dataset. the results from the experiments are shown graphically in fig. , and selected results are provided in table . as the solid lines in fig. show, by increasing the maximum number of available posts per user, the performance of the model with user-level attention (experiment ) generally improves in terms of accuracy, precision, and f , while the recall drops. it can also be observed that, the changes in performance measures decreases as the number of available posts increases, and the performance gradually converges to the final ones when all the posts are available (see table ). we believe that the gradual improvement in the precision and drop in recall is because, in general, the posts that have been highly weighted by the user-level attention mechanism, include signals that the user is anorexic (rather than signals that the user is not). the dotted lines in fig. a show that, by increasing the maximum number of available posts from to , the performance of model with the user-level average pooling (experiment ) also improves in terms of accuracy, precision, and f , but deteriorates in terms of recall. this shows that, the first highly-weighted posts included information necessary for the system to make a prediction about the user. this has even led the model with average pooling to have a higher f score than the model with user-level attention, as the former has a tendency to get less biased towards specific posts. table and fig. b show that, the f and the accuracy of the model with the user-level average pooling starts to drop from and posts, respectively. as a result, the model with user-level attention overtakes the one with average pooling in terms of f and accuracy, after more than and posts are available, respectively. this shows the higher capability of the model with the user-level attention over the other in handling the higher number of posts. figure also shows that increasing the maximum number of available posts leads to a rapid drop in the recall of the model with user-level average pooling. this shows that, the higher the number of available posts to the model with average pooling, the more this model loses the capability on observing the patterns that are useful in detecting anorexia. this can also support the hypothesis that the user-level attention mechanism generally assigns higher weights to the posts that are more signalling of anorexia. table . performance of the system (in percentage) in terms of the maximum number of highly-weighted posts from each user. the columns labelled as with avg pool refer to the model in which the user-level attention mechanism is ablated. the last row refers to the case when all the posts from each user are provided to the system. in order to further analyze the behavior of the user-level attention mechanism, the highest weights assigned by the attention mechanism were studied across users. in addition, we also calculated the average of the n-th highest weights assigned to the posts by the users, with n ranging from to . we compared these values for two types of users: labelled by the model as anorexic (i.e. truepositive and false positive users) and labelled by the model as non-anorexic (i.e. true-negative and false-negative users). as table shows, on average, the attention mechanism has assigned . higher weights to the most highly weighted posts in users detected as anorexic, compared to users detected as non-anorexic. the value of this ratio drops in the lower-ranked posts. this seems to indicate that, generally when the attention mechanism assigns a high weight to a post, the system is more likely to label its author as positive. it is similar to when humans observe a piece of evidence, and tend to heavily base their decision upon it. this also seems to support the hypothesis that the attention mechanism assigns weights based mostly on how signalling their authors were anorexic, as opposed to signalling not having anorexia. as opposed to jain and wallace [ ] and serrano and smith [ ] , who reported that attention is not a means to explainability, our findings are generally in favor of explainability in the user-level attention mechanism. this may be due to the following two reasons: . the approach by jain and wallace [ ] was only focused explainability of attention mechanism, when applied on the output of a recurrent encoder. we argue that, in such a case, each sample (contextual word representation, in their case) already has part of the information from the other samples in the context. as a result, finding the source of information is difficult in such a case. serrano and smith [ ] also with using attention over non-encoded samples, and they showed that the level of explainability in this case is significantly higher than when the input to the attention is encoded (using an rnn or cnn). however, they mainly focused their report on the cases where the attention input is encoded. our work was fully focused on non-encoded attention inputs. . the difference in the nature of the task we are performing is generally different from jain and wallace [ ] and serrano and smith [ ] , as our approach focuses on the user-level (inter-document) attention mechanism, while their experiments were focused on intra-document attention. in a task involving the detection of a mental health problem, such as anorexia, the number relevant and informative posts is quite rare [ ] [ ] [ ] [ ] , while even in a similar task, there may be several ways of inferring information from a particular document. finally, in order to achieve stronger evidence that an inter-document attention is explainable, we believe that our approach would benefit from being used in conjunction with the experiments proposed by jain and wallace [ ] and serrano and smith [ ] , as their experiments can also be applied to the inter-document attention mechanism. in this work, we proposed a quantitative approach to validate the explainability of the user-level attention mechanism for the task of the detection of anorexia in social media users based on their online posts. our results show that, the user-level attention mechanism has assigned higher weights to the posts from a user based on how much they were signalling the user is at risk of anorexia. two directions for the future work can be proposed: as indicated in sect. , the first direction is to complement the current experiments with the ones proposed by jain and wallace [ ] and serrano and smith [ ] , in order to see if the findings from the current experiments are in line with theirs. the second direction is to expand the current set of experiments to other mental health binary classification tasks (such as detection of depression, ptsd, or suicide risk), and later to multi-class or multi-label classification tasks in the field of nlp for clinical psychology. neural machine translation by jointly learning to align and translate one billion word benchmark for measuring progress in statistical language modeling quantifying mental health signals in twitter deep learning in social computing bert: pre-training of deep bidirectional transformers for language understanding interpreting recurrent and attention-based neural models: a case study on natural language inference neural network methods in natural language processing hierarchical neural model with attention mechanisms for the classification of social media text related to mental health attention is not explanation deephealth: deep learning for health informatics interactive visualization and manipulation of attention-based neural machine translation emotional disclosure on social networking sites: the role of network structure and psychological needs a structured self-attentive sentence embedding a test collection for research on depression and language use clef lab on early risk prediction on the internet: experimental foundations overview of erisk : early risk prediction on the internet (extended lab overview). in: working notes of clef -conference and labs of the evaluation forum overview of erisk : early risk prediction on the internet. in: working notes of clef -conference and labs of the evaluation forum clpsych shared task: predicting current and future psychological health from childhood essays suicide risk assessment with multi-level dual-context language and bert inter and intra document attention for depression risk assessment clac at clpsych : fusion of neural features and predicted class probabilities for suicide risk assessment based on online posts quick and (maybe not so) easy detection of anorexia in social media posts proceedings of the conference of the north american chapter of the association for computational linguistics: human language technologies (naacl-hlt ) is attention interpretable? in: proceedings of th annual meeting of the association for computational linguistics (acl ) attention is all you need attention-based lstm for aspect-level sentiment classification clpsych shared task: predicting the degree of suicide risk in reddit posts acknowledgements. the authors would like to thank the anonymous reviewers for their feedback on a previous version of this paper. this work was financially supported by the natural sciences and engineering research council of canada (nserc). key: cord- - nfp hcs authors: gong, liang; söderlund, henrik; bogojevic, leonard; chen, xiaoxia; berce, anton; fast-berglund, Åsa; johansson, björn title: interaction design for multi-user virtual reality systems: an automotive case study date: - - journal: procedia cirp doi: . /j.procir. . . sha: doc_id: cord_uid: nfp hcs virtual reality (vr) technology have become ever matured today. various research and practice have demonstrated the potential benefits of using vr in different application area of manufacturing, such as in factory layout planning, product design, training, etc. however, along with the new possibilities brought by vr, comes with the new ways for users to communicate with the computer system. the human computer interaction design for these vr systems becomes pivotal to the smooth integration. in this paper, it reports the study that investigates interaction design strategies for the multi-user vr system used in manufacturing context though an automotive case study. due to the fast development in the domain of communication and an ongoing trend of digitization and digitalization, manufacturing enterprises are facing important challenges in today's market environments: a continuing tendency towards reduction of product development times and shortened product lifecycles. in addition, there is an increasing demand of customization, being at the same time in a global competition with competitors all over the world. this trend, which is inducing the development from macro to micro markets, results in diminished lot sizes due to augmenting product varieties (high-volume to low-volume production) [ ] . to cope with this augmenting variety as well as to be able to identify possible optimization potentials in the existing production system, it is important to have a precise knowledge of the product range and characteristics manufactured and/or assembled in this system. in this context, the main challenge in modelling and analysis is now not only to cope with single products, a limited product range or existing product families, but also to be able to analyze and to compare products to define new product families. it can be observed that classical existing product families are regrouped in function of clients or features. however, assembly oriented product families are hardly to find. on the product family level, products differ mainly in two main characteristics: (i) the number of components and (ii) the type of components (e.g. mechanical, electrical, electronical). classical methodologies considering mainly single products or solitary, already existing product families analyze the product structure on a physical level (components level) which causes difficulties regarding an efficient definition and comparison of different product families. addressing this virtual reality (vr) technology is attracting increasing attention in various industries during the past decade. it is the result of the ever-matured vr technology as well as the great potential to improve existing practices in different industries. the manufacturing industry is among those that actively pushing to find out the applications of using vr. there are growing studies shows that much effort are spending on the vr applications in the manufacturing industry [ ]- [ ] . it has shown that vr technologies have great advantages to improve areas like factory layout planning, product design, training, etc., especially for globally distributed manufacturing companies that have different functional team located in different parts of the world [ ] , [ ] - [ ] . however, while the studies have showed the promising future of vr usages in manufacturing, they have also pointed out the new challenges that are hinder the vr integration process [ ] , [ ] , [ ] . the usability related issues of vr systems is one of the most important to be addressed, as it would affect end users acceptance, which would ultimately affect the process of vr integration in manufacturing. these issues are resulted from the new interaction mediums that comes with the vr systems. different from the "window, icon, menu, and pointing device" (wimp) or direct manipulation interaction style, vr systems use handheld controllers, haptic devices or even voice and gesture recognition for users to interact with the systems. jacob et al. pointed out that there is a lack of established interaction design principles and frameworks for post-wimp systems [ ] . in this study, we have chosen a globally distributed manufacturing company as the case to study the different possibilities of interaction design approaches for the multi-user vr system used in the design review process. two iterations of vr system development and evaluation have been conducted. virtual reality (vr) technology is attracting increasing attention in various industries during the past decade. it is the result of the ever-matured vr technology as well as the great potential to improve existing practices in different industries. the manufacturing industry is among those that actively pushing to find out the applications of using vr. there are growing studies shows that much effort are spending on the vr applications in the manufacturing industry [ ]- [ ] . it has shown that vr technologies have great advantages to improve areas like factory layout planning, product design, training, etc., especially for globally distributed manufacturing companies that have different functional team located in different parts of the world [ ] , [ ] - [ ] . however, while the studies have showed the promising future of vr usages in manufacturing, they have also pointed out the new challenges that are hinder the vr integration process [ ] , [ ] , [ ] . the usability related issues of vr systems is one of the most important to be addressed, as it would affect end users acceptance, which would ultimately affect the process of vr integration in manufacturing. these issues are resulted from the new interaction mediums that comes with the vr systems. different from the "window, icon, menu, and pointing device" (wimp) or direct manipulation interaction style, vr systems use handheld controllers, haptic devices or even voice and gesture recognition for users to interact with the systems. jacob et al. pointed out that there is a lack of established interaction design principles and frameworks for post-wimp systems [ ] . in this study, we have chosen a globally distributed manufacturing company as the case to study the different possibilities of interaction design approaches for the multi-user vr system used in the design review process. two iterations of vr system development and evaluation have been conducted. available online at www.sciencedirect.com virtual reality (vr) technology is attracting increasing attention in various industries during the past decade. it is the result of the ever-matured vr technology as well as the great potential to improve existing practices in different industries. the manufacturing industry is among those that actively pushing to find out the applications of using vr. there are growing studies shows that much effort are spending on the vr applications in the manufacturing industry [ ]- [ ] . it has shown that vr technologies have great advantages to improve areas like factory layout planning, product design, training, etc., especially for globally distributed manufacturing companies that have different functional team located in different parts of the world [ ] , [ ] - [ ] . however, while the studies have showed the promising future of vr usages in manufacturing, they have also pointed out the new challenges that are hinder the vr integration process [ ] , [ ] , [ ] . the usability related issues of vr systems is one of the most important to be addressed, as it would affect end users acceptance, which would ultimately affect the process of vr integration in manufacturing. these issues are resulted from the new interaction mediums that comes with the vr systems. different from the "window, icon, menu, and pointing device" (wimp) or direct manipulation interaction style, vr systems use handheld controllers, haptic devices or even voice and gesture recognition for users to interact with the systems. jacob et al. pointed out that there is a lack of established interaction design principles and frameworks for post-wimp systems [ ] . in this study, we have chosen a globally distributed manufacturing company as the case to study the different possibilities of interaction design approaches for the multi-user vr system used in the design review process. two iterations of vr system development and evaluation have been conducted. the research process as well as the findings are presented and discussed in the rest of the paper. vr technology is not new; the first vr system was successfully implemented with a hmd that presents a user with stereoscopic d view slaved to a sensing device, which tracks the user's head movement [ ] . ever since then, research effort are continuing spending on this area. korves and loftus described vr systems based on the different setups and thus categorized them as:  desktop system.  wide-screen projection system.  immersive cave system.  immersive vr system using hmds [ ] . thanks to the latest advancement in hardware and software of vr devices, immersive vr system with hmds is becoming more and more viable in many industries, thus it again attracted much attention in academia and industry during the last years. in this emerging domain of vr, there are many studies on the different issues of integrating it to the manufacturing process. most of them have shown that vr can improve existing practice in the manufacturing industry. for example, wiendahl and harmschristian have shown that immersive vr is an important tool for collaborative factory planning, especially when multiple viewpoints of users are visualized [ ] . menck stated that the vr based collaborative planning tool can extend the communication and facilitate cooperation beyond existing organizational boundaries, which would reduce the complexity of work and increase the work efficiency [ ] . at the same time, some studies also pointed out that the vr systems is also bringing new challenges for both developers and end users [ ] , [ ] , [ ] . those challenges can be categorized into two major groups. one is associated with creating the realistic virtual environment. the challenge lies on the complexity of generating realistic enough models that can represent the real world so that enough context is provided to perform the intended tasks in vr. this involves the data source integration and data compatibility issues when reuse data already existed for other purposes in vr [ ] - [ ] . another is the usability related issues of vr systems. it is reported that some early users of vr systems have difficulty understanding the interaction logic and some even feel motion sick [ ] , [ ] . instead of the already established style of human computer interaction for desktop or mobile application, the vr interaction design is new and need further studies to develop supporting guideline and framework. another worth noting point is that, most reported vr systems are designed for single user. in the real world, multiuser engagement is the norm for most activities. therefore, one argument is that if it is not multi-user, then it is not virtual reality. based on the above description, this study has chosen the immersive vr system with hmds as the basic setup to study the interaction design of multi-user vr systems for manufacturing. an automotive company was selected as the case to study how interaction design would affect manufacturing companies from adopting the multi-user vr system. the company's research and development department is located in gothenburg of sweden, while the manufacturing plants are situated across different regions of china. it makes the selected company a perfect case for the purpose of this study. the team that was actively involved in this study was the manufacturing engineering (me) team. they are mainly responsible for developing all body-in-white (biw) process. therefore, the industrial base of the study was set as fixture design and review process using multi-user vr system. the first step to develop the multi-user vr system is to gather the data needed for the intended task. for the purpose of this study, we used two types of data sources: cad models of the new fixture design and point cloud data of the factory environment. the cad models are created by me team using catia tm [ ] in the format of jt. however, as the jt files contains many irrelevant information for the vr scenario such as the internal structure of the fixture, the provided jt files were further optimized using pixyz [ ] by removing unnecessary surfaces and converting the files to the fbx format which can be directly imported to the vr development platform. the point cloud data was captured onsite of their chinese manufacturing plants using d laser scanning technology. it is a realistic version of virtual representation of the real factory environment. studies have shown the point cloud environment provides more accurate contextual information compare to the simplified version of the completely computer generated cad models. the disadvantage of point cloud data is that there is no surfaces or triangle meshes, but millions of points, which requires excessive computing power to rendering them in real time for vr system. it is also difficult to interact with the point cloud data in vr due to this limitation. therefore, the point cloud data was processed with adjusted level of density based on the area of interest. simple surfaces such as floor and walls were also meshed using mathematic algorithm. unity d [ ] was chosen as the development platform to integrate different data sources and program the needed functionality. the unity plugin pun [ ] was used to handle the network and synchronization for different users. htc vive headsets were used as the vr hardware throughout the study. in this study, two versions of the multi-user vr system were developed in sequence. the pilot version aimed at testing the feasibility of the multi user concepts as well as getting some feedback of the desired functionalities for the second version. therefore, minimal effort were spent on user interface design, but focus on making the connection reliable and synchronization with low latency. the vr view is shown in fig. . a second version was later developed based on the feedback gathered through the pilot workshops with additional features such as customized user avatars. moreover, two sets of interaction design approaches were implemented to further understand user's preference in multi-user vr system. in addition to the common practice that uses d trackable controller to trigger different functionalities in vr (illustrated in fig. ), d graphic user interfaces (gui) were added on top of the projected screen so that conventional mouse and keyboard can also be used to activate the same functionalities (illustrated in fig. ). the pilot version were tested in two occasions. it was first tested for the cross continent connection from gothenburg in sweden with fellow colleagues in national institute of standards and technology (nist) in united states. this test was to verify the stability of the connection and the quality of real time synchronization of both objects and audio. after the verification, it was later tested in a workshop hosted at university site in sweden. five engineers from the case company as well as two senior researchers from fraunhofer-chalmers centre (fcc) participated in the workshop. all participants tested the multi-user vr system in pairs and performed the intended design review task in vr. after the hands-on experience with the system, a following-up discussion was held to get feedback and potential improvement for the second version. the second version was developed based on the feedback collected from the pilot workshops. then it was tested in case company's facility in gothenburg. there were participants from different groups of the case company joined the test. the participants' role ranging from process developer to vice president of the case company. a tutorial session was held to introduce the study and the basic functions of the multi-user vr system. then all participants tested it in pairs to perform the design review task (illustrated in fig. ) . after the test, each participants filled in a scale-rating questionnaire regarding their general experience of the interaction design and a semistructured interview was held to collect qualitative data for better analysis of the rating result. the two tests have shown that the multi-user vr system can provide enough context for relevant engineers from different sites to collaborative perform the fixture review process in vr. the connection is stable while objects of interest and audio synchronization are reliable. the scale-rating questionnaire are a series of statements for the participants to rate based on their experience. the results are analyzed with the qualitative feedback from the semistructure interview. a portion of the questionnaire results are illustrated in fig. . with the statements listed in the vertical axis. it shows that all participants believe this type of multi-user vr system would be great benefit for their daily work. the customized user avatars allow users to easily identify their colleagues in vr, therefore increased their feeling of being presence in the virtual environment. most participants still prefer controller interaction than the d gui. but when there is an admin role that supervise the vr review session from the projected screen, then it would be preferable to have the d gui for the admin to control the review process. when it comes to the question whether all users should have the same functions, the view are diverse. while out of believe the point cloud data make the virtual model more realistic, some questioned the necessity, especially for design review tasks that are independent from the factory environment. in this study, a multi-user vr system was developed to support the design review process for globally distributed manufacturing companies. we focused on the interaction design of such system through two iterations of development and evaluation. it has affirmed interaction design of vr systems is of great importance for manufacturing companies to widely adopt and benefit from the latest advancements of vr technologies. it is generally believe that the quality of the vr environment is pivotal to the user experience, which would affect whether vr systems can be widely adopted and used in manufacturing companies [ ] - [ ] . to put it another way, previous findings emphasis the importance of creating realistic virtual environment to attract users to switch from conventional work procedure to vr. the feedback we collected partly affirm it. as features such as customized user avatars and point cloud factory environment, which would improve the realistic level of the vr environment, received positive ratings from the participants. however, the test result also hinted that not all the design review work would need those features. another aspect that need to consider is that those features would take longer time and much more effort to implement. therefore, it should be judged by the individual cases whether to add those qualityimprovement features in the vr systems. the new interaction medium brought by the vr system is a challenge for both developer and end users. today, most people are used to the wimme interaction from the personal computer as well as touch screen in smart phone and tablets. due to the specific setup with hmd in vr systems, it is impossible to transfer the matured interaction design approaches from those platforms directly to the vr world. instead, handheld controllers with multiple buttons and various haptic devices are created to support the interaction in vr. however, these devices and interaction logic are new to most of the users. previous studies have shown that this might be one of the hinder for vr usage [ ] , [ ] , [ ] . the result of this study shows that the new interaction devices did make it difficult for new users to engage in the tasks at the beginning, but it also shows that going backward to find ways of using those matured medium in vr is not a favorable option either. most users expressed the idea that they can learn the new ways rather fast and the tests also proved that most of them managed to use the different functions smoothly at the end. however, there is no established knowledge or standards that they can be referred to when they were first presented with a vr system. this should be improved by joint effort from academia and vr industry to further study and establish uniform framework for interaction design in vr. additionally, technologies such as voice and gesture recognition are becoming increasingly viable with the latest development in deep learning and artificial intelligence (ai) [ ] . many practitioners and researchers are starting to integrate those new technologies to explore new ways of user interaction in the vr [ ] , [ ] . due to the complex nature of manufacturing environment, there might be some negative factors such as the noise level of factory are too high for voice recognition. it is still worth the effort to have future studies that can explore voice and gesture interaction for vr systems used in the manufacturing industry. this study has set out to investigate appropriate interaction design approaches for multi-user vr system used in the me process. a multi-user vr system was developed and evaluated to in two iterations with academia and industrial partners. the results indicates that multi-user vr system can complement and improve the existing me processes. new interaction medium comes with vr system is a challenge for the wider usage in manufacturing, but also an opportunity to further develop principles and standards of interaction design so that the manufacturing companies can also enjoy then benefit brought by the latest advancement of vr technologies. an immersive and collaborative visualization system for digital manufacturing introducing quantitative analysis methods into virtual environments for real-time and continuous ergonomic evaluations a novel facility layout planning and optimization methodology adaptation of high-variant automotive production system using a collaborative approach virtual manufacturing as a way for the factory of the future the use of virtual reality techniques during the design process: from the functional definition of the product to the design of its structure improvement of manufacturing processes with virtual reality-based cip workshops virtual and augmented reality technologies for product realization virtual factory: an integrated framework for manufacturing systems design and analysis virtual reality applications in manufacturing industries : past research , present findings , and future directions reality-based interaction: a framework for post-wimp interfaces a head-mounted three dimensional display the application of immersive virtual reality for layout planning of manufacturing cells virtual factory design -a new tool for a co-operative planning approach collaborative factory planning in virtual reality a survey on human-computer interaction in mixed reality state of the art of the virtual reality applied to design and manufacturing processes rationalizing virtual reality based on manufacturing paradigms virtual reality-based approach to manufacturing process planning virtual reality approaches for immersive design virtual reality and d imaging to support collaborative decision making for adaptation of long-life assets real walking in virtual environments for factory planning and evaluation catia d experience pixyz software what is unity? photon unity networking framework for realtime multiplayer games and applications a novel vr tool for collaborative planning of manufacturing process change using point cloud data a fast parametric deformation mechanism for virtual reality applications virtual plastic injection molding based on virtual reality technique immersive authoring of tangible augmented reality content : a user study tangible user interfaces: past, present, and future directions deep learning-based human motion recognition for predictive context-aware human-robot collaboration mr : an interdisciplinary framework for mixed reality experience design and criticism cognitive load by context-sensitive information provision using binocular smart glasses in an industrial setting this work is funded by vinnova (swedish agency for innovation systems) through the summit project. this work is also within the sustainable production initiative and the production area of advance at chalmers university of technology. the support is gratefully acknowledged.i would also like to express my sincere thanks to all the medical staff that are fighting with the covid- . keep safe and victory is ours! key: cord- - mglhh authors: jovanovi'c, mladjan; baez, marcos; casati, fabio title: chatbots as conversational healthcare services date: - - journal: nan doi: . /mic. . sha: doc_id: cord_uid: mglhh chatbots are emerging as a promising platform for accessing and delivering healthcare services. the evidence is in the growing number of publicly available chatbots aiming at taking an active role in the provision of prevention, diagnosis, and treatment services. this article takes a closer look at how these emerging chatbots address design aspects relevant to healthcare service provision, emphasizing the human-ai interaction aspects and the transparency in ai automation and decision making. ⬛ conversational systems are entering our everyday lives, such as amazon alexa and google assistant. beyond such all-in-one systems, there are growing demands for building conversational services in healthcare. the services are using a shared design metaphor -a personal assistant that provides healthcare through natural conversation. the main reason is making online healthcare more user-friendly -an agent takes a patient through a turntaking dialog, similar to how doctors do [ ] , [ ] . the recent transformation of digital healthcare aims at providing personalized health services and helping patients in self-managing their conditions [ ] . chatbots are becoming part of this paradigm shift as a cost-effective means to deliver such services [ ] . besides, they facilitate well-being as ingraining positive self-care habits [ ] . the main benefits are ease of use and accessibility -the conversation metaphor makes them more intuitive, available on smartphones everywhere, anytime [ ] . however, healthcare provision still depends on health professionals. therefore, inspiring researchers and practitioners to explore the potential of conversational ai to bring personalized services through automation [ ] . the growing number of healthcare chatbots, partly due to the democratization of chatbot development, motivates a closer look at how the systems address aspects concerning user experience, adoption and trust in automation, and healthcare provision. while a recent literature review provides insights on general design considerations for healthcare chatbots [ ] , we focus on publicly available chatbots and take a more domain-specific approach in identifying relevant design dimensions considering the specific role in healthcare provision. we report on a systematic analysis of publicly available healthcare chatbots. due to their very nature, we believe that the form and function of the healthcare chatbots cannot be neatly separated and are equally important. this paper: • identifies salient service provision archetypes that characterize the emerging roles and functions the chatbots aim to fulfill; • assesses the design choices concerning domainspecific dimensions associated with health service provision and user experience; • provides implications for theory and practice that highlight existing gaps. a healthcare chatbot can be conceptualized as a set of interconnected layers. the knowledge layer contains the domain and user databases. the information from this layer is an input for the service layer of healthcare provision. this layer implements healthcare decision-making processes. once it generates the decisions, they are communicated to a dialog layer. the rule-matching dialogs are robust and straightforward to build but work in a constrained domain, whereas probabilistic (machine learning) tools may provide more natural dialog but lack robustness [ ] . the dialog layer extracts user intentions, creates responses by consulting the service layer, and communicates them to the presentation layer that implements a text-or voice-based ui. we introduce the analytical framework containing the attributes to characterize and compare existing healthcare chatbots. the framework captures the domain-specific aspects of healthcare provision, emphasizing the human-ai interaction aspects and the transparency in ai automation and decision making. the dimensions are summarized below and detailed in table : • conversational style. while deploying suitable and successful dialog strategies is still an open challenge in human-ai interactions, some domain-specific dimensions emerge from research in health information systems [ ] and recent general guidelines for human-ai interaction [ ] . sociability, empathy, understandable medical vocabulary, and emerging dialog styles are among the key design dimensions we analyze. • understanding users. the users' ability to express intentions and be understood by the chatbots is another fundamental challenge in dialog-based interactions. data collection methods (explicit or implicit) and the ability to recover from conversation breakdowns are among the critical functions of healthcare, shaping user's expectations of natural dialog capabilities. • accountability. there are ethical and practical reasons for making ai more transparent and explainable. not only concerning model biases and privacy concerns but also to understand the reasoning behind algorithmic decisions that could have a significant impact on healthcare service provisioning [ ] . the implementation of these features can address privacy concerns, build trust, and make the service provisioning more accountable for users [ ] . • healthcare provision. the domain-specific aspects of service provisioning include the type of the chatbots' role, emerging functional archetypes within these roles, collaboration facilitated by the chatbot, and continuity of service delivery. using the analytical framework, we identified and characterized publicly available healthcare chatbots in the english language, as of august . starting from the health provisioning roles, we analyzed how the other chatbot design dimensions are implemented for the primary functions. to this end, we screened health-related chatbots from two popular databases, botlist (https://botlist.co) and chatbots.org (https://chatbots.org) in the categories "health and fitness" and "body health". we included other available, well-known examples that are often analyzed in the scientific literature and appear at the top search results when searching for chatbots for health. our list is a representative sample with a clear overview of the chatbots' use. a total of chatbots were screened and annotated by health provisioning roles by two researchers, resulting in relevant health chatbots (coding agreement %). two researchers independently annotated the chatbots' functions in an emergent coding scheme, which was then consolidated by consensus to describe the salient archetypes for each role. the analysis focused on archetypes that describe a direct involvement of the chatbot in the healthcare service provisioning, emulating the functions of a healthcare professional (e.g., in performing a diagnosis, or delivering a therapy). it was the case for the of the archetypes identified. we excluded chatbots from the archetypes "support for diagnosis", "access to healthcare" and "support for therapy", where chatbots act as mediators to facilitate access to healthcare services, information and products. accordingly, we had around - top chatbots from archetypes, for a total of chatbots. we selected popularity (e.g., number of views and likes) as a measure of quality and adoption, thus focu- [ ] , a premise of the effectiveness of a health intervention. as a property (content) of the conversation itself, it builds and maintains social bonds among interactants [ ] . in this regard, we look at whether chatbots implement social conversation capabilities. empathy. another desirable characteristic of chatbots is exposing empathy, the ability to recognize users' emotions and respond appropriately to the current mood [ ] , [ ] , [ ] , [ ] , and even more so in vulnerable scenarios posed by health services. thus, we qualitatively assess if chatbot dialogs provide empathy cues in their conversations. vocabulary. adapting the conversation content to a suitable and understandable medical vocabulary is also important for the quality of the healthcare provision [ ] . we analyze strategies and features adopted by chatbots to address this aspect explicitly. proactivity. a mix of proactive and reactive behavior is another inherent feature of everyday human communication that ai aims to replicate [ ] , and that can inform how services are provided. we examine whether chatbots display proactive behaviors in providing their services. understanding users data collection. an important aspect is understanding the input patterns and data collection methods enabled by chatbots as they inevitably balance the robustness and naturalness of conversations [ ] . in this regard, we qualitatively assess emerging input patterns, and determine whether the chatbots leverage on explicit and implicit data collection strategies. error recovery. error recovery strategies are crucial for addressing the breakdowns and preventing from degrading the user experience and drawing incorrect decisions [ ] . we assess whether chatbots implement error recovery strategies, focusing on the ability to deal with human error. explainability. we define explainability as the ability of the chatbot to inform and explain its decisions (e.g., how a diagnosis was reached, or why an activity program was changed). transparency. we look at the transparency with regard to data collection practices (e.g., why is the chatbot collecting certain information). role. it indicates the chatbot's role(s) in healthcare provision as diagnosis, prevention, and therapy. some chatbots may play multiple roles. archetype. it describes emerging service patterns within the health provision role. collaboration. together with proper integration with healthcare infrastructure as a means for augmenting skills of medical professionals [ ] . when analyzing collaboration, we focus on identifying the stakeholders involved and the type of technology-mediated interactions enabled by the chatbot. continuity. refers to the time of the service delivery, whether in one-time sessions (akin short-term visits) or leveraging on the opportunity for more continuous healthcare delivery [ ] , [ ] . sing the analysis on the most widely adopted chatbots. a scoring system was derived to complement the qualitative observations, and describe the level of implementation of each dimension: low, indicating that the dimension was not explicitly addressed (e.g., explainability: the chatbot does not provide any explanations for its decisions); medium, showing partial implementation (e.g., explainability: the it professional chatbot provides some evidence, but important details are still missing); high, meaning a high degree of implementation (e.g., explainability: the chatbot explains major decisions). the supplementary material including details about the process, scoring system, dialog examples and resulting annotated dataset is available at https://cutt.ly/tdozkpm. we detail our analysis in the following sections, describing the emerging archetypes and salient design features as characterized by our framework. diagnostic chatbots check user's symptoms and recommend courses of action. three general archetypes of diagnosis chatbots emerged from our analysis: • support for diagnosis ( / ). the archetype does not perform the diagnosis but instead support a diagnosis by either i) facilitating access to health services, such as the pathology lab chatbot facilitating access to doctors and scheduling visits, ii) supporting online consultations with health professionals, such as the icliniq that pairs up users with doctors for online consultation, and iii) providing conversational access to information regarding symptoms and diseases, such as the webmd. • general symptom checker ( / ). the archetype is mimicking a consultation with a general health professional, walking users through a series of questions regarding their symptoms to diagnose a condition, and, in some cases, suggests a course of action. a prominent example is healthtap, a chatbot that collects symptoms and provides potential causes in dialog-based interactions. • specific symptom checker ( / ). this archetype aims at either i) helping users confirm the presence and severity of an ailment, or ii) diagnosing a particular condition, akin to having a consultation with a medical specialist. an example from the first category is feverbot, which helps users determine whether they require medical attention, and for the second, the mental care bot, which specializes in diagnosing mental disorders. the archetypes have different foci but follow a typical dialog structure, consisting of profiling the user, collecting and refining symptoms, diagnosis, and follow-up. this process is typically enacted in onetime sessions involving a user and the chatbot, not reusing previously collected information -even though some chatbots offer symptoms journaling (e.g., healthtap). collecting and refining symptoms is approached with different dialog styles. specifying symptoms in natural language (e.g., "i have back pain") has varying levels of success. the chatbots try to identify the symptom either directly from the user input (e.g, your.md), directing the user input to a search page (e.g., babylonheath), or a combination of both. follow-up questions to refine the symptoms (e.g., "which part of your back is hurting") display a closed list of predefined options (e.g., "lower back" or "upper buttock area") requiring users to select an option from a list (e.g., ada), or swiping through illustrated cards (e.g., healthtap). the symptom checkers for skin problems (e.g., skinive) have the possibility of uploading pictures to bootstrap the diagnosis, using computer vision to interpret the input. interestingly, none of the chatbots make use of implicit data collection (e.g., sensor data), but collect user information explicitly during the conversations. allowing users to edit and backtrack information is an error recovery mechanism absent in almost half of the chatbots (e.g., buoyhealth provides an "edit" option on each user input). the majority of chatbots interact with users following a scripted interview without being cautious of the users' responses. making technical language understandable is a strategy implemented explicitly by only three chatbots. they address this aspect by either including contextual help for each question (e.g., ada: "what does it mean"), presenting pictures of the symptoms and options (e.g., healthtap), and indirectly by requesting feedback on questions (e.g., buoyhealth: "this is confusing"). transparency clarifies potential privacy concerns regarding sensitive questions. despite its importance, only one chatbot explicitly addressed this issue, namely buoyhealth, allowing users to inquire about the reasons behind each question ("why am i being asked this?"). the typical diagnosis report is a list of potential causes, explaining the reasoning behind and courses of action. it consists of the information describing: i) strength of the evidence supporting the diagnosis, ii) symptoms present for the cause, iii) type of care recommended and the specialist needed, and iv) possible actions. except for three chatbots, the majority explains their decisions by providing evidence connecting reported symptoms to the potential causes. all chatbots explicitly inform users that the report does not replace a medical consultation. the chatbots in this role assist in tracking and building awareness of a user's health and help prevent health declines by building desirable habits. the prevention is offered through a range of services [ ] that can be aggregated into three archetypes: • access to healthcare ( / ). the chatbots from this archetype do not participate in the provision of healthcare service but represent an entry point to using these services. its main goal is to increase the efficiency of healthcare services by reducing the effort and increasing the speed of access. they do this by i) connecting patients to healthcare professionals, ii) discovering medical drugs online, or iii) providing healthcare customer service tasks. for example, the iclinic provides / medical customer service for patients, such as booking appointments with their doctors. the project alta facilitates the discovery and purchase of pills for improving cognitive functions. • health education ( / ). the educational archetypes prevent by teaching users on prevention procedures for specific health conditions. for instance, doctorbot provides healthcare information on different topics. a very recent example is jennifer, a chatbot designed to combat misinformation and answer questions on the covid- virus. • health coaching ( / ). its goal is to prevent health degradation by improving general wellbeing and inducing a healthy lifestyle. at its core are psychological incentives to maintain or facilitate desirable behaviors. their functions can be categorized as i) personalized reminders for mental exercise and workouts, ii) psychological motivators for mental and physical practices, or iii) advisors on positive habits regarding sleep, nutrition, and well-being. for example, the fitcircle uses reputation-based incentives for exercising (such as goal-setting and progress information), while the stopbreathe&think recommends mental exercise for psychological well-being. the forksy is a nutrition assistant who advises on nutriments tailored to the user's health goals and eating habits. concerning conversational style, the first archetype specializes in a specific task, such as connecting with a doctor, booking an appointment, or ordering particular medicine. the latter two are flexible in a sense that they educate on related, but different topics, or coach on a range of well-being activities within a type or across several types (physical, mental, nutritional). all archetypes offer instrumental, goaloriented conversations in which they guide users through predefined programs of exercises. regarding the vocabulary, less than half the chatbots referred the users to external glossaries of terms for additional explanations. the archetypes employ proactive conversation during prevention, by probing users for necessary information. the user information is collected explicitly, from the conversations. the error recovery is present as either asking the users to rephrase the misunderstood input or jumping to the beginning of the dialogs. regarding accountability, the archetypes do little to explain the specific decision to the users (e.g., general explanations on their websites). the transparency with the archetypes is low in the sense of not highlighting nor clarifying the reasons for collecting data from their users. the chatbots from the first archetype follow a scripted, question-answering dialog flow without keeping conversation history. for example, healthy recipe hq implements on-demand, dynamic question answering by aggregating available information online. the second archetype implements a similar dialog structure while educating on a specific topic. the third archetype offers a more complex dialog with flexible conversation based on the history and user profiling. health coaching chatbots recommend actions based on user monitoring through sustained conversation. the actions originate from activities' pools including workouts, nutrition plans, and mental exercise programs. they continuously motivate users for a healthy lifestyle by combining psychological incentives that include self-reflection on achieved progress, reminders for activities, and the evidence it professional from peers. some health coaching chatbots connect with peers, such as fitwell. otherwise, the majority targets individuals. the role assists or provides treatment of specific health declines or conditions (such as pregnancy or therapeutic diet). the therapy services can be grouped into the following archetypes: • support for therapy ( / ) . this archetype assists during the phases of the treatment. the examples are personalized reminders to medication adherence as part of the therapy (e.g., florence), or listing medicines based on positive online user reviews for natural health cures (e.g., healthrobot). • health therapy ( / ). the therapist archetype takes a more active role by providing at-home therapy for its patients. based on their primary target, they offer either i) drug-based therapy or ii) practice-based therapy for its patients. the first sub-archetype recommends and tracks medicine use during the treatment (such as florence). the second sub-archetype provides practical guidance on the activities for successful treatment. for instance, ketobot suggests a ketogenic diet to fight against diabetes. this archetype provides a range of therapies that target specific mental states and emotions. the therapy is a structured, guided conversation that starts with a question-answering to identify the patients' condition. it continues by recommending specific exercises based on the estimated conditions and tracking the target state. the measures of treatment's progress are self-reported, provided by patients as a freeform text. woebot is a personalized mental therapist who tracks users' mood and suggests mental activities. wysa aims at improving patients' mental health by providing emotional support. the common goal is to build resilience to mental disorders (i.e., stress, depression and anxiety) by developing positive habits (i.e., selfawareness and optimism). the archetypes support multiple activities (i.e., facilitating access to different types of medicines) or health conditions (i.e., aiming at various health conditions). the cbt archetypes try to understand and respond to the users' current mood. this aspect is entangled with social elements, such as engaging in small talk on non-treatment topics. it increases the amount of conversation, specifically user-provided data, to improve the accuracy of guessing the user's emotions. as for the therapy-specific terms, the chatbots offer explanations during the conversations. the first archetype induces conversations through personalized reminders, whereas cbt chatbots initiate the dialogs on a time basis. the error recovery strategies include restarting current conversation, or asking additional questions for mutual understanding. user data are collected explicitly, from user input during conversations. concerning accountability, the minority of the therapy chatbots explain their decision to users, and clarify the reasons for collecting specific user data. the supportive archetype uses rule-based or statistical approaches to dialog management. the former follows a predesigned turn-taking conversation (i.e., meditation master to alleviate stress and sleeplessness). the latter implements a more natural back-and-forth message exchange that uses context information to generate responses. the second archetype employs flexible, probabilistic dialogs that preserve the conversation context. it follows a conversation pattern in which patients are screened for their condition(s), and guided and monitored throughout the specific treatment. the dialog adapts to the treatment's progress. the third archetype follows a similar principle. the context information is extracted from user profiling and conversation history (i.e., wysa in monitoring therapy progress). it focuses on multiple mental/emotional states and conversation as a means of therapy, offering more fluid dialogs (i.e., woebot). archetypes mainly target individual users. current healthcare chatbots remain a supplementary service rather than a replacement of the medical professionals. the archetypes share levels of engagement as i) an active role in healthcare provision, emulating the functions of a healthcare professional; ii) facilitating access to healthcare services by matching users with service providers, or supporting the service delivery; and iii) providing users with information and products. some chatbots expose multiple roles. for example, florence instructs medicine intake during therapy, and provides information about a disease, for prevention. our findings confirm the existing evidence that a practical, task-oriented goal conversation is dominant with healthcare chatbots [ ] , [ ] . except for the cbt archetype, chatbots do little to understand human social and emotional cues. concerning the use of medical vocabulary, our analysis revealed that the diagnostic and preventative chatbots use expert vocabulary in healthcare provisions and should improve on making the terminology understandable to their users. as for the dialog initiative, preventative and therapeutic archetypes are more proactive than diagnostic archetypes that react to the users' questions. future healthcare chatbots should improve the conversation's social and emotional aspects while adapting to users' health literacy. the lack of these aspects may create dissonance in user experience due to false expectations and lead to rejection [ ] , [ ] , [ ] . understanding users. the chatbots' natural language capabilities remain limited, leading to alternatives such as breaking down the dialog in multiple questions and constraining user choices to it professional deal with this limitation. this is particularly the case in diagnosis chatbots or data collection tasks where the user cannot narrate their condition but instead is guided through questions. preventative and therapeutic chatbots offer more fluent dialogs by learning about their users and reusing conversation context. in analyzing the chatbots, we also noticed repair strategies [ ] not implemented correctly in the dialog, sometimes even in its most basic form, such as preventing users from modifying previous inputs. another aspect we noticed is that the archetypes rely mostly on explicit user input from conversations. the exception are chatbots that accept pictures of affected skin areas for automatic processing. the above reveals that restricting interaction through close-ended questions may expand the chatbot's understand of the users, but errors related to human input or incorrect references can still emerge in interactions. improving error management is the emerging requirement for health chatbots. as for opportunities, chatbots can significantly benefit from continuous implicit data collection using smart devices' sensing technologies currently widespread [ ] . accountability. we discovered that transparency in data collection is insufficient in the archetypes. the minority of the analyzed chatbots give reasons for asking users for their data (figure ). concerning explainability, the chatbots do little to provide causes or explain the reasons behind their healthcare decisions. these can raise concerns with users about chatbots' accountability. the implications are two-fold. firstly, public expectations of the chatbots need to be set explicitly, in advance. secondly, there is a need for greater transparency and explainability of the logic behind each archetype. in particular, i) explaining why and how the chatbot collects certain information, and ii) clarifying all the relevant decisions taken during the service provision (e.g., why diagnosing an illness, prescribing a drug, or increasing the workout intensity). healthcare provision. traditionally, the patient's condition is assessed during sporadic, short-term visits to healthcare facilities, and critical decisions may be affected by the uncertainty of the measurements. we observed that not many chatbots capitalize on the opportunities for continuity in the service provisioning. diagnostic chatbots provide one-time sessions with little to no information shared across sessions. therapeutic and health coaching archetypes offer prolonged, shared interactions with patients at their homes. currently, healthcare chatbots are standalone applications, independent of healthcare systems. concerning collaboration, the archetypes focus on individuals as a user-chatbot relation. group dynamics are missing, supported by a handful of chatbots as user-chatbot-doctor relations (i.e., sensely), or forming peer groups (i.e., fitwell). the chatbots need to address the above aspects through continuous service and better integration, opportunities not entirely leveraged by current healthcare systems [ ] , [ ] . the future healthcare chatbots should also engage and moderate among multiple actors consisting of patients, their social circles (family and friends), and caregivers. this assumes reusing social context by leveraging the humans and intelligent agents, the environment, and the healthcare infrastructure [ ] . healthcare chatbots are yet to capitalize on the opportunities provided by conversational media to provide better dialog-based interactions appropriate to the task, and with the social intelligence to manage interaction in potentially vulnerable scenarios. our work provides the first step towards these goals by characterizing the emerging roles of chatbots in service provisioning and highlighting design aspects that require the community's attention. we believe our findings can guide researchers in identifying and validating dialog patterns appropriate to the existing archetypes, and practitioners in understanding the emerging use cases of chatbots in healthcare provision. future work should focus on understanding the actual medical value of the chatbots and their effects on health outcomes and user experience. our study does not attempt to evaluate an exhaustive list of existing health chatbots, but a representative sample of the current landscape. we acknowledge that the popularity metric used for selecting chatbots for full evaluation is an approximation, but given the number of chatbots evaluated any potential misrepresentation should be alleviated. the evaluation did not address the effects of chatbots on health outcomes or the evidence supporting the health service. similarly, we did not analyze the content, nor assess it for accurate and evidence-based information. they are important aspects to be addressed in future work. survey of conversational agents in health delivering cognitive behavior therapy to young adults with symptoms of depression and anxiety using a fully automated conversational agent (woebot): a randomized controlled trial towards interpersonal assistants: next-generation conversational agents designing for health chatbots approaches for dialog management in conversational agents designing interactive systems to mediate communication between formal and informal caregivers in aged care guidelines for human-ai interaction what do we need to build explainable ai systems for the medical domain what makes a good conversation?: challenges in designing truly conversational agents discourse analysis designing emotionally sentient agents evaluating and informing the design of chatbots chatbots, humbots, and the quest for artificial general intelligence resilient chatbots: repair strategy preferences for conversational breakdowns a study on machine learning enabled iot devices for medical assistance kbot: knowledge-enabled personalized chatbot for asthma self-management key: cord- -rlzbznav authors: unnikrishnan, vishnu; shah, yash; schleicher, miro; strandzheva, mirela; dimitrov, plamen; velikova, doroteya; pryss, ruediger; schobel, johannes; schlee, winfried; spiliopoulou, myra title: predicting the health condition of mhealth app users with large differences in the number of recorded observations - where to learn from? date: - - journal: discovery science doi: . / - - - - _ sha: doc_id: cord_uid: rlzbznav some mhealth apps record user activity continuously and unobtrusively, while other apps rely by nature on user engagement and self-discipline: users are asked to enter data that cannot be assessed otherwise, e.g., on how they feel and what non-measurable symptoms they have. over time, this leads to substantial differences in the length of the time series of recordings for the different users. in this study, we propose two algorithms for wellbeing-prediction from such time series, and we compare their performance on the users of a pilot study on diabetic patients - with time series length varying between and recordings. our first approach learns a model from the few users, on which many recordings are available, and applies this model to predict the nd, rd, and so forth recording of users newly joining the mhealth platform. our second approach rather exploits the similarity among the first few recordings of newly arriving users. our results for the first approach indicate that the target variable for users who use the app for long are not predictive for users who use the app only for a short time. our results for the second approach indicate that few initial recordings suffice to inform the predictive model and improve performance considerably. practitioner. not only do these devices help in diagnostics, by recording values of attributes related to health and subjective well-being; they also allow that the disease may be monitored with only asynchronous involvement of the practitioner. self-monitoring of the disease contributes thus to patient empowerment, and also delivers precious data that can be used for personalization, i.e. for treatments tailored to the individual needs and characteristics. this potential requires adequate data to build upon. a major challenge of mobile health platforms that collect user inputs is that the amount of data users contribute can vary substantially. as we reported in [ ] when analysing user recordings on the mobile health platform "tracky-ourtinnitus" [ ] , a minority of users interact intensively with the system and contribute a disproportionately large amount of data, while the majority of users contribute very few inputs. in this study, we investigate whether predictions can be made for this majority of users by learning a model on the few users who provide many data to the system. also, differently from our work in [ ] , we focus here on a one-step-ahead forecast instead of classification. we propose an approach that learns from users who contribute long sequences of inputs to predict the subjective perception of wellbeing for users who contribute only short sequences of input data, including users that have very recently joined the platform. each user in the system is required to fill in a "end of day questionnaire", where he reports among other things the overall "feeling in control", the variable of prediction interest. these user-level timestamped observations therefore constitute one user-centric time series, the length of which varies depending on how long the user has been in the system, and how the doctor's recommendation of filling in the questionnaire at the end of every day has been followed. we denote the set of users with long sequences of recordings as u long and the users with few recordings as u short . our approach deals with the following three questions: -rq : how well can we predict the behaviour of users in u short given the data from the users in u long ? -rq : can we predict the entire sequence of observations of a user in u short with a model trained only on data from users in u long ? (i.e, does a model learned on data from users with long sequences transfer to those with short ones?) -rq : how can we incorporate early recordings of users in u short incrementally into the model to improve predictive performance? the paper is organised as follows: sect. introduces related literature, followed by sect. , which introduces the m-health application on which the this work is based. section discusses our proposed solution, followed by a discussion in sect. , and closing remarks in sect. . in our work, we concentrate on time series in applications of health and wellbeing. the early study [ ] by madan et al. reported on the potential of mobile technologies to capture epidemiological behaviour change, including physiological symptoms like running nose, and mental health conditions like stress. for example, they found that total communication of the affected persons decreased for the response "sad-lonely-depressed" (cf. [ ] for the definition of this response). while a change in communication intensity can be captured by bluetooth connection activity or absence thereof, the information on how a person feels demands user inputs. ecological momentary assessments (ema) are a widespread tool for this purpose [ , ] . ema is an instrument for assessing "behavioral and cognitive processes in their natural settings" [ ] . from the technical perspective, ema recording is feasible and well-supported. for example, in their survey on sleep selfmanagement apps [ ] , choi et al. list the recording of user-entered data as an important functionality, and stress that all investigated apps do support this functionality. however, next to the technical modalities, ema relies also on selfdiscipline and adherence. as mohr et al. stress in [ ] , "although a number of small studies have demonstrated the technical feasibility of sensing mood, these findings do not appear to generalize". in the meanwhile, there are large studies involving ema recordings of more participants for longer time periods. however, the emphasis seems still to be on users who interact intensively with the mobile health application. in their insightful comparison of the results of ema recordings with the tracky-ourtinnitus mhealth app versus retrospective ratings of the users, only users with at least days of interaction were considered [ ] . for findings with the trackyourstress platform that records ema geolocation, only users with at least recordings per day were considered [ ] . this provokes the question of whether users with few recordings belong to the same population as users with many recordings. in [ ] , probst et al. considered both users with few days of recordings and users with many days of recordings for their multi-level analysis (median number of days: , with range from to days), but demanded at least ema per day, each of them containing answers for the three ema items under study [ ] . in this work, we do not attempt to win insights that pertain to a specific group of users, but rather to assess whether the ema of users with few recordings can be predicted by models learned on users with many recordings. the ema of mobile health app users constitute multivariate time series. the challenge posed by short time series is discussed by palivonaite and ragulskis in their work on short-term forecasting [ ] , where they associate the length of the time series to the reliability of longer-term forecasts. dynamic time warping (dtw) or one of its numerous enhancements can be used to compare time series of different lengths and exploit their similarity for learning. dtw is a very old method, cf. [ ] , for an early citation to dtw by authors yfantis and elison who proposed a faster alternative. such methods can be used to enhance algorithms like [ , ] , which do predictions by building a model for each time series, but can also exploit information from similar time series. despite this potential, the amount of data per user in some mhealth applications is very small, so that we opt for similarity-based methods that capitalize more on the similarity of values than on the ordering of the valuesalbeit both are taken into account. as part of two pilot studies on empowerment of diabetes patients, a mobile crowdsensing framework was adjusted to implement the trackyourdiabetes mhealth platform [ , ] . figure summarizes the entire procedure of the app from the patient's point of view. the pilot studies were conducted in regions of spain and bulgaria, and involved patient recruitment and exposition to two variants of the app, while under remote supervision by a practitioner. the platform comprises two mobile applications (i.e., native android and ios apps), a relational database, a restful api [ ] , and a web application. the mobile applications are only used by the patients, while the web application was used by the patients as well as their related healthcare professionals. the latter were enabled by the web application to monitor the data of the patients as well as to provide individual feedback if wanted or required. before starting interaction with the app, study participants registered with the platform by using the mobile apps or a web application . after that, they have to fill out three registration questionnaires once : one registration questionnaire collects demographic data, one collects information on the selfmanagement of the patients with his/her diabetes, and one captures the extend to which diabetes causes distress to the patient. there were ema recordings more than once a day, concerning physical activity and food intake, and ema recordings at the end of each day, using the endof-day questionnaire items depicted in table . furthermore, individualised messages based on given answers of daily assessments were provided with the goal "don't know", "no", "both", "hypoglycemia", or "hyperglycemia" did you feel to be in control of your diabetes today? numeric, [ - ] to better motivate the patients in using the platform. the healthcare professional(s) responsible for the participants could also provide individualised feedback. finally, a chatbot was integrated, which could be used by the patients to discuss questions on their diabetes. for the analysis of the proposed approach, we concentrated on the bulgarian pilot study and investigated solely the user inputs to the end-of-day questionnaire; no further features were considered. we investigate a prediction problem on timestamped data, transferring a predictor learned on the data of one set of users, u long , to another set of users u short . in all cases, our goal is to predict many observations of a user, not just the next one, as is typical in many time series prediction problems. this section offers a brief overview of the terms used in this work and their exact definitions, which is followed by a broader description of our workflow in sect. . . user sequences: each user p who uses the mhealth app generates a time-ordered sequence of observations x p,t , where p is the user, and t denotes time. we distinguish between users with short sequences of observations, constituting a set u short , and users with long sequences of observations, constituting a set u long . for the partitioning of users into these two strata, we consider a threshold τ length . in our experiments, we set τ length on the basis of the user-observations distribution, which has shown a gap. in distributions that follow a power law, τ length serves to separate between the short head and the long tail. more generally, we may decide to place into u short those users who have very recently started their interaction with the app and thus have contributed only few initial observations. observations: an observation is a multi-dimensional vector of values from a feature space f . in our application scenario, an observation is an ema recording comprised of answers to questions from a questionnaire. accordingly, an observation is a mix of numerical and categorical variables. handling categorical data: a term frequency-inverse document frequency (tf-idf) inspired approach: before training the models, it is important to consider the exact way in which categorical attributes in the input data are used. of the various questions in the questionnaire answered by the users, the questions that generate categorical data (chosen from a drop-down list) need to be treated to accommodate for the fact that not all answers are equally likely. compared to simply using a standard method like one-hot encoding, this step brings the answer closer to the user's history, for e.g., by more accurately capturing the information that a user who commonly answers a question with "no" has said "yes", even if "yes" frequently appears in the dataset. we treat the answers to this categorical data as 'words', and each session where the questionnaire is answered as a 'document'. during preprocessing, given the exact answer chosen by a user during a particular day, we replace the binary flag marking the presence of that word with a new value that is adjusted to reflect the amount of "surprise" in seeing that data point given the user through the use of the tf-idf (see [ ] ) inspired formula: preprocessed value = f term · −log nterm n , where f term can only be binary, since the categorical answer only picks one term from a list of several. the inverse document frequency component measures how often the term has appeared in the user history. core learning method: given data p p for all users p ∈ u long for time points . . . t, we have p p = {x p, . . . x p,t }. using this data, we create a linear regression model that for each possible i ∈ . . . t − , learns to predict the target variable for time point i + . naturally, since there is no known label for the last time point t, each user p with a sequence length of t only provides t − time points of training data. this model is only used for predicting the labels for the observations {x p, . . . x p,t− } for all users p ∈ u short . augmented method: we augment the above method by creating predictions specific to the users u short : in addition to the above model which only learns on the users of u long , we add an additional k-nn regression model that is trained only on the user's own history of observations. this means that given an observation x p,t , we can generate predictions for x p,t+ from two models, the model trained on all users in u long , and additionally the k-nn regression model that has only been trained on the observations of the user p seen so far, i.e. x . . . x t− (note: the training data for p ∈ u short ends at t − , because x t− is used as the label for the training point x t− , and the true label for x t− has not been observed yet). step: the basic workflow we propose has a preliminary step and two components. the preliminary step is designed to check that the task is indeed learnable, and success at this stage can ensure that the further steps in the workflow are applicable. for this, instead of training a model on only data from users in u long , a model is trained on % of all data, and the performance is analysed to confirm that the model can learn given the data. by framing the problem as a regressor and not as a time series forecast, we avoid the problem of having insufficient data to train a time series forecasting model. this model can unfortunately not be used as a baseline to compare against since it does not learn on the same amount of data as the model learning only on u long , and also because the number of data points available for testing over users in u short is very small (often only a couple of observations). however, the performance of this model can still be considered a benchmark for the upper limit of performance for the transfer learning model. in this workflow, we find a subset of the dataset d comprising of only the data x p,t , where p ∈ u long . this creates a model trained only on the data from users with long sequences, the performance of which is tested on users of u short . it is important to remember that the model has the challenging task of making predictions for users that have never been seen by the system, and predictions for them are made based only on what has been learned over the user. this is arguably more challenging than predicting unseen observations for users who have already contributed observations to the training set. additionally, since these users have not adhered to instructions of the physician to use the application for the prescribed period of months, it is possible that these users differ somehow in the expression or the perception of the disease in some way. however, it is still possible that a model learned on those data points from long users bring a modest predictability to the disease development of users in u short . similarly to the model introduced above, we learn to predict the numeric value of the target variable for the next observation given the questionnaire answers of the current observation (including the current value of the target variable). a graphical overview of the workflow is shown under 'basic workflow' of fig. . if the users in u short are indeed different from the users in u long , then using a model that transfers the parameters learned on u long is not expected to bring reliable predictions to the users in u short . however, since the users in u short do not bring enough data to train complex models, only simple techniques can be used to try and incrementally improve predictive performance over users in u short by capturing the idiosyncratic patterns in the user's disease development/answering style. this design aims to balance the tradeoff between keeping as much data as we can use to learn about how the disease develops, while also staying close to the idiosyncratic ways in which the user may answer questions. in this work, we propose the use of a k-nearest neighbours regressor trained over the user's own history, the predictions of which are used to augment the predictions from the u long model weighted on their past errors (similarly to [ ] ). restricting the k-nn regressor to the user's own sequence also has the unintended consequence of out-of-the-box support for data-privacy, something that is especially relevant in the medical domain. during use, the k-nn regressor is incrementally trained on the user sequence as more of it becomes available, and the errors are recorded for comparison to the standard u long model. figure shows an overview of the model training process with the k-nn augmentation component. we describe the dataset of our evaluation in subsect. . , and then explain in subsect. . how the number of users with short and long sequences affect the prediction tasks and the settings of k-nn in the augmented workflow. we evaluate using mean absolute error (mae). the results of the proof-of-concept experiment are in subsect. . , while the results for the basic workflow and the knn-augmented workflows are in subsect. . . for our evaluation, we used the dataset of the bulgarian pilot study. this dataset contains observations from study participants. while the inclusion of the users from the pilot study in spain is desirable, a model that learns on the combined data of the two pilots is not done for two reasons: (a) the two countries are different in the dominant diabetes type that the users have, and (b) many users in the spain pilot use continuous blood sugar measuring devices, strongly influencing the accuracy of the "self-assessed" blood sugar estimations, and therefore, the "feeling in control". we set τ length = days, whereupon of the users belong to u long ( + days) and users are in u short ( - days) after eliminating users with users with less than days of data. we denote this dataset as l +s dataset hereafter, to stress the number of users per length-stratum. figure depicts the number of days of interaction for all users. it can be observed that there is a clear separation between users in u long compared to the rest of the users. of the variables of the eod questionnaire filled by the pilot study participants (cf. sect. ), the target variable is the th one on table , i.e. each user's self-reported 'feeling in control', on a scale of to . we denote this variable as 'eod feel' hereafter. for the proof-of-concept step in subsect. . , we train a predictor on the first % of the observations of the users in u long of l +s dataset and predict the subsequent % observations. as can be seen on fig. , the users in u long contribute unequally to learning: user # contributes more than (out of ca. ) observations to the training dataset, while user # contributes less than (ie half as many). similarly, we predict the eod feel value of more than observations of user # and ca. of user # . for the basic workflow of subsect. . , the prediction task is to predict all observations of the users in u short of the l +s dataset, without having seen any observations on them during training. this amounts to predictions. for the k-nn augmented workflow, some observations of each user in u short of the l +s dataset are disclosed and used for augmentation of the model learned on all of the u long observations in the l +s dataset. user # has less than observations, user # has (cf. fig. ). this imposes an upper limit to k: if we set k = , we cannot do any predictions on user # . on the other hand, k-nn based regression needs at least observations per user to learn. larger values of k allow for a more robust regression model and make the prediction task easier, since less eod feel values are predicted. to investigate whether the very few first observations on a user can inform a model learned on u long , we have set k = . this amounts to predictions. the goal of this experiment is to check whether the prediction problem is indeed learnable, in the sense that we can derive a useful prediction model. figure shows the performance of the proof-of-concept regression model for the first prediction task of subsect. . on l +s dataset, learning on the first % of all user observations (all), and accordingly on the first % of the observations in u long (l), resp. u short (s). for "all" (leftmost part), mae remains around %, decreasing slightly within "l" (u long ) and increasing slightly within u short . however, mae within "s" ( u short ) is rather unreliable, since there are less than two observations per user in the testset (more precisely: . ). hence, these mae values serve only as lower limit for the errors of the transferred models. since we have a baseline (albeit weak, since errors for u short are not reliable) for the performance of a model on the data from all users, we can now investigate the transfer learning case where the model is only learned on the users of u long . as already described above, there are two workflows that use such a model, a more basic workflow that uses a model learned over u long only, and another model that augments the basic workflow with a user-specific k-nearest observations regressor. the models are all evaluated against the absolute errors they make in their predictions. the 'mean' in the mean absolute error may either be computed over all predictions that a model has made, or may be restricted to the predictions for particular users. given the rather short sequence lengths of the users in u short , it is necessary to not rely on point-estimates like means, but consider the 'spread' in the errors as well. we therefore present box plots over all the prediction errors for the basic and k-nn augmented workflow. the entire test set contains observations for which predictions are required. basic workflow. in this workflow, instead of learning a model over data from all users, as described in sect. . , we will learn a model only on those users who have contributed more than days of data, the necessary criterion for their addition to the set u long . figure shows a box plot of the absolute prediction errors for the transferred model. compared to the basic model in sect. . , the mae over all predictions for all users has increased from . to . (indicated by the green triangle). however, since not all users in u short have the same sequence lengths, the mae is biased towards the users with longer sequences. the blue dots inside the box plot shows the mae for each user separately, and we can see that the users who are best predicted have errors as low as , with the worst-predicted users showing mae in excess of . the mean being closer to indicates that the well-predicted users are indeed the ones with longer sequences. this indicates that they are more similar to the users in u long than other shorter members of u short . figure shows a box plot of the absolute prediction errors for the k-nn regressor, along with comparisons against the u long model's errors. the box plot on the far right shows the errors in the case where the predictions of each method are combined as a weighted average on their cumulative errors for the user to form a final prediction. since the users in u short can have as few as observations, our choice of k is quite strongly limited to very low numbers, as the k-nn regressor cannot create predictions until it sees at least k observations. in these cases, the k-nn is assumed to make the same prediction as the linear regression model over u long , since it is necessary to compare the errors of the two models for the same number of predictions. it can be seen in fig. , the k-nn model does indeed show lower mean and median errors, indicated by the green triangle and the line in the box plot. however, it can also be seen that the worst-case performance of the k-nn model is worse than that of the linear regression model. the roughly similarly sized gaps between the mean and the median errors in the k-nn and the linear regression models indicate that both models sometimes make large errors, albeit in different directions. combining the predictions from both models does seem to mitigate these worst-case errors, since the mean and the median absolute errors are observed to be very close, at around . in addition to the boxplots of the error itself, fig. shows how the error develops over time for users in u short as they stay longer in the system. the x-axis shows the observation number, with the mae on the y-axis. the mae at each time point is averaged over the individual prediction errors over all users at that time point. until the k th observation, the k-nn predictor does not generate predictions, but we have used the linear regression model prediction errors in order to not unfairly favour any algorithm. from the rd observation, however, we see that the user-level k-nn predictor almost always outperforms the linear regression model (and therefore the weighted average model). it is also noteworthy that until the th time point, the error-weighted combination of both models is very close to the k-nn model. this shows that augmenting the predictions of the basic workflow with the k-nn regressor does improve performance. the results beyond the th observation get progressively less reliable since all users in u long have at least observations, but the number of users contributing to the averages after time point get unreliable, though it is possible that users in u short are more and more predictable given the history of their own observations with time. in this work, we studied if the data from users of a diabetes self-management app with more than days of data (u long ) can be used to infer something about the future of less intensive users with less data. since neither the number of patients (n = ) nor the number of observations for the longest-sequence user ( ) is very long, we investigate simpler models like linear regression. the model is trained to predict the next observation for user-reported "feeling in control", the last question of the end of day questionnaire, given the answers to all questions of end of day questionnaire for the current observation. the categorical information in the dataset is handled using a method inspired by tf-idf to capture the 'surprise element' in an answer given a user. i.e., when a user answers a question like (s)he usually does, that answer gets a smaller weight than if the answer is unexpected. further, we investigate whether transfer learning can be used to learn a model on the users of u long in order to make predictions for the observations of users in u short . we saw that the transferred model predictably shows a higher error, which can be mitigated by combining the predictions of the u long model with a k-nearest neighbours regressor over the patient's own past data. the short sequences necessitate that the k is limited to quite low values, but the predictor that combines the predictions of both models does eliminate some extreme errors, bringing the mean and the median errors closer. the primary threat to validity of this work is the size of the dataset from which the conclusions have been drawn. the large disparity between the lengths in u long and u short make further analysis of the k-nn augmented predictor less reliable, making the findings more qualitative than quantitative. although two pilots exist from which data can be analysed, this study focused the investigation only on data from bulgaria because the users for the two studies are drawn from different populations (the proportion of type diabetics is very different, and the spanish pilot users had continuous glucose monitoring devices implanted). additionally, the mhealth application collects more data from the users, of which the eod questionnaire is only one. a system with either more users or longer observation sequences may enable the study of how other dimensions not measured by the eod questionnaire may affect the subjective "feeling in control", or allow for the use of more sophisticated models than simple linear regression. it is also highly likely that x t might not be best predicted by the value of x t− , but rather by some larger or even user-dependent lag, depending on external factors like weekends, or user-specific factors like exercise routine. the testing of this parameter is challenging at the moment because it further decreases the amount of data available for testing the predictions over users in u short , or adds more features and complexity in the context of already scarce data. if such a large disparity did not exist between the lengths of users in u long and u short , it would also be possible to investigate the aspects that characterise users who transition from u short to u long . exploiting entity information for stream classification over a stream of reviews smartphone applications to support sleep self-management: review and evaluation validity and reliability of the experiencesampling method combining mobile crowdsensing and ecological momentary assessments in the healthcare domain social sensing for epidemiological behavior change personal sensing: understanding mental health using ubiquitous sensors and machine learning short-term time series algebraic forecasting with internal smoothing does tinnitus depend on time-of-day? an ecological momentary assessment study with the "trackyourtinnitus mobile crowdsensing in healthcare scenarios: taxonomy, conceptual pillars, smart mobile crowdsensing services machine learning findings on geospatial data of users from the trackyourstress mhealth crowdsensing platform prospective crowdsensing versus retrospective ratings of tinnitus variability and tinnitus-stress associations based on the trackyourtinnitus mobile platform requirements for a flexible and generic api enabling mobile crowdsensing mhealth applications data mining ecological momentary assessment (ema) in behavorial medicine entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity vector interpolation for time alignment in speech recognition acknowledgements. this work was funded by the the chrodis plus joint action, which has received funding from the european union, in the framework of the health programme ( - ). key: cord- -i a jwck authors: jiang, bo; lu, zhigang; liu, yuling; li, ning; cui, zelin title: social recommendation in heterogeneous evolving relation network date: - - journal: computational science - iccs doi: . / - - - - _ sha: doc_id: cord_uid: i a jwck the appearance and growth of social networking brings an exponential growth of information. one of the main solutions proposed for this information overload problem are recommender systems, which provide personalized results. most existing social recommendation approaches consider relation information to improve recommendation performance in the static context. however, relations are likely to evolve over time in the dynamic network. therefore, temporal information is an essential ingredient to making social recommendation. in this paper, we propose a novel social recommendation model based on evolving relation network, named soerec. the learned evolving relation network is a heterogeneous information network, where the strength of relation between users is a sum of the influence of all historical events. we incorporate temporally evolving relations into the recommendation algorithm. we empirically evaluate the proposed method on two widely-used datasets. experimental results show that the proposed model outperforms the state-of-the-art social recommendation methods. the last decades have witnessed the booming of social networking such as twitter and facebook. user-generated content such as text, images, and videos has been posted by users on these platforms. social users is suffering from information overload. fortunately, recommender systems provide a useful tool, which not only help users to select the relevant part of online information, but also discovery user preference and promote popular item, etc. among existing techniques, collaborative filtering (cf) is a representative model, which attempt to utilize the available user-item rating data to make predictions about the users preferences. these approaches can be divided into two groups [ ] : memory-based and model-based. memory-based approaches [ , , ] make predictions based on the similarities between users or items, while model-based approaches [ , ] design a prediction model from rating data by using machine learning. both memory-based and model-based cf approaches have two challenges: data sparsity and cold start, which greatly reduce their performance. in particular, matrix factorization based models [ , ] have gained popularity in recent years due to their relatively high accuracy and personalized advice. existing research works have contributed improvements in social recommendation tasks. however, these approaches only consider static social contextual information. in the real world, knowledge is often time-labeled and will change significantly over time. figure shows the entire social contextual information over time which can be derived from links on social networks. user jack post message m , which mention users tom and eric, at time point t . subsequently, user jack post message m , which mention user eric again, at time point t . meanwhile, message m is retweeted by user ellen at time point t . we observe that new social action is often influenced by historical related behaviors. in addition, historical behaviors have an impact on current action over time, and the impact strength decreases with time. on the other hand, we notice that the evolving relation network is very sparse, which greatly reduce the recommendation performance. in order to deal with data sparsity, we leverage network embedding technology, which has contributed improvements in many applications, such as link prediction, clustering, and visual. in this work, we propose a novel social recommendation model based on evolving relation network, named soerec, which leverages evolving relation network and network embedding technique. the proposed method explicitly models the strength of relations between pair of users learned from an evolving relation network. to efficiently learn heterogeneous relations, network embedding is employed to represent relation into a unified vector space. we conduct experiments on two widely-used datasets and the experimental results show that our proposed model outperforms the state-of-the-art recommendation methods. the main contributions of this paper are as follows: -we construct a dynamic, directed and weighted heterogeneous evolving network that contains multiple objects and links types from social network. compared with static relation graph, the evolving graph can precisely measure the strength of relations. -we propose a novel social recommendation model by jointly embedding representations of fine-grained relations from historical events based on heterogeneous evolving network. -we conduct several analysis experiments with two real-world social network datasets, the experimental results demonstrate our proposed model outperforms state-of-the art comparison methods. the rest of this paper is organized as follows. section formulates the problem of social recommendation. section proposes the method of social recommendation based on evolving relation network to recommend the candidate users. section presents experimental results of recommendation. finally, sect. reviews the related work and sect. concludes. we briefly review the related works from two lines in this section: one on network embedding and the other on social recommendation. network embedding. network embedding has been extensively studied to learn a low-dimensional vector representation for each node, and implicitly capture the meaningful topological proximity, and reveal semantic relations among nodes in recent years. the early-stage studies only focus on the embedding representation learning of network structure [ , , , ] . subsequently, network node incorporating the external information like the text content and label information can boost the quality of network embedding representation and improve the learning performance [ , , , , , ] . network embedding indeed can alleviate the data sparsity and improve the performance of node learning successfully. therefore, this technique has been effectively applied, such as link prediction, personalized recommendation and community discovery. social recommendation. recommender systems are used as an efficient tool for dealing with the information overload problem. various methods of social recommendation have been proposed from different perspectives in recent years including user-item rating matrix [ ] , network structure [ ] , trust relationship [ , , , ] , individual and friends' preferences [ , ] , social information [ ] and combinations of different features [ , ] . the above social recommendation methods are proposed based on collaborative filtering. these methods all focus on fitting the user-item rating matrix using low-rank approximations, and also use all kinds of social contextual information to make further predictions. most of the studies that use both ratings and structure deal with static snapshots of networks, and they don't consider the dynamic changes occurring over users' relations. incorporating temporally evolving relations into the analysis can offer useful insights about the changes in the recommendation task. the intuition behind is that there are two basic accepted observations in a real world: ( ) the current behavior of user is influenced by all his/her historical patterns. ( ) a behavior with an earlier generation time has a smaller influence on the user's current behavior, while the one with a later generation time has a greater influence. therefore, we first formally define the concept of evolving relation network, as follows: where n u is the set of vertices representing users, and n i is the set of vertices representing items, and e is the set of edges between the vertices. the types of edges can be divided into user-user and user-item relationships with temporal information. hence, g is a dynamic, directed and weighted heterogeneous evolving network. from the definition, we can see that each edge not only is an ordered pair from a node to another node, but also has a weight with time-dependent. in order to measure the strength of relations between two nodes objects in the heterogeneous evolving network g, we introduce the concept of evolving strength, which is formally defined as follows: where ψ is an event type (e.g., post, mention, follow, etc.) and t is the timestamp of e. an event sequence Γ between two nodes is a list of events {e , e , · · · , e n }, ordered by their timestamps an event corresponding to an edge. thus, the strength of evolving relations denoted by f is the sum of individual event influence. we formulate the problem of social recommendation as a ranking based task in this work, as follows: given a heterogeneous evolving network g at time t, and a target user u i , and a candidate set of items Ψ , we aim to generate a top k ranked list of items Ω ∈ Ψ for u i at time t + according to the target user's preference inferred from historical feedbacks. let r ∈ r m ×n be the rating matrix with m users and n items. the (i, j)-th entry of the matrix is denoted by r ij that represent the rating of user i for item j. u ∈ r k×m and v ∈ r k×n be user and item latent feature matrices respectively, where k is the dimension of latent factors. the preference of i-th user is represented by vector u i ∈ r k× and the characteristic of j-th item is represented by vector v j ∈ r k× . the dot product of u and v can approximate the rating:r ≈ u t v j . recommendation based on probabilistic matrix factorization (pmf) [ ] solve the following problem can avoid overfitting, || · || f denotes the frobenius norm of the matrix. incorporating the knowledge from present and historical behavior data can accurately measure the strength of influence, as shown fig. . in this work, we model the strength of relation between users as a sum of the influence of each event by multiplying a weight. the weight is calculated by a function, called decay function. since the influence between users can't be less than zero in social networks, the weight ranges from to and decreases with the event's existing time. thus, we formalize the decay function d ij (t) with timestamped information as follows: where t is the current time, t i is the generation time of historical event, and λ is a parameter which controls the decay rate. through the analyses in the following experiments in the paper, we set the parameter λ as . . based on the influence of historical events, we can measure the current strength of social relation between users as follows: where i e(ψ,ti) is a parameter which controls the weight of different events. to simplify the model, we assume that the importance of any events is equal. the learned evolving relation network has three characteristics: ( ) a weighted and directed graph; ( ) a sparsity graph; ( ) heterogeneous information network. in order to learn the evolving relation network, we employ large-scale information network embedding (line) [ ] model to simultaneously retain the local and global structures of the network. in particular, we leverage the line model to learn users' embedded representations of the evolving relation network the firstorder proximity and the second-order proximity. as shown fig. , the detailed process is demonstrated as follows. user relation with first-order proximity. the first-order similarity can represent the relation by the directly connected edge between vertices. we model the joint probability distribution of users u i and u j as the first-order similarity p (u i , u j ). the similarity can be defined as follows: where − → u i ∈ r d is the low-dimensional vector representations of vertices u i . the empirical distribution between vertices u i and u j is defined as follows: where w = (ui,uj )∈e w ij , and w ij is the relation strength of the edge (u i , u j ) measured by eq. ( ) . to preserve the first-order proximity in evolving relation network, we use the kl-divergence to minimize the joint probability distribution and the empirical probability distribution as follows: user relation with second-order proximity. the second-order proximity assumes that vertices sharing many connections to other vertices are similar to each other. in this work, we assume that two users with similar neighbors have high similarity scores between them. specifically, we consider each user vertex as a specific "ontext", and users with similar distributions over the "contexts" are assumed to be similar. thus, each user vertex respectively plays two roles: the user vertex itself and the specific "context" of other user vertices. we introduce two vectors − → u i and − → u i , where − → u i is the representation of u i when it is treated as a vertex, and − → u i is the representation of u i when it is treated as a specific "context". for each directed user edge (u i , u j ), we firstly define the probability distribution of "context" u j generated by user vertex u i as follows: where k is the number of user vertices or "contexts". the empirical distribution of "contexts" u j generated by user vertex u i is defined as: where w ij is the weight of the edge (u i , u j ) as the same, and d i is the out-degree of vertex u i , i.e. d i = k∈n (i) w ik , with n (i) as the set of out-neighbors of u i . to preserve the second-order user relation, the following objective function is obtained by utilizing the kl-divergence: combining first-order and second-order proximities. to embed the evolving network by preserving both the first-order and second-order proximities, line model can minimize the objective functions o and o respectively, and learns two low-dimensional representations for each user vertex. then, the two low-dimensional representations are concatenated as one low-dimensional feature vector to simultaneously preserve the local and global structures of evolving relation network. finally, each user vertex u i is represented as − → u ∈ r d +d . incorporating simultaneously user's explicit relation and implicit relation can boost the ability of social recommendation. as mentioned above, line model can learn users' embedded representations, where first-order proximity correspond to the strength of explicit relation and second-order proximity correspond to the strength of implicit relation. hence, the fine-grained relation measure can better predict user ratings by also encoding both the first-order and second-order relationships among users. after performing the line model, we can obtain users' embedded presentations. we then measure the fine-grained relations among users on the basis of the inner product of the presentations as follows: where − → u i and − → u j denote the low-dimensional feature representations of users u i and u j , respectively. in this work, relation strength w ij can be viewed as a coarsegrained relation value between users u i and u j . compared to coarse-grained measure, the fine-grained measure s ij is more informative, and can effectively distinguish the importance of recent and old events among users. in other words, the fine-grained measure can deduce the strength of latent relation based on neighborhood structures while two users have no explicit connections. the fact of matter is that user decision making is influenced by his/her own preferences and close friends in real-world situations. specifically, on the one hand, users often have different preferences for different items. on the other hand, user are likely to accept their friends' recommendations. thus, we assume that the final rating of user u i for item v j is a linear combination between the user's own preference and his/her friends' preferences, where the rating can be defined as follows:r where s(u i ) is the set of most intimate friends of user u i . in the above equation, the first item corresponds to the prediction rating based on their own preferences, while the second item corresponds to the prediction rating based on the preferences of his/her friends, and η is a parameter that controls the relative weight between user's own preferences and friends' preferences. the ratings of users to items are generally represented by an ordered set, such as discrete values or continuous numbers within a certain range. in this work, without loss of generality, the differences in the users' individual rating scales can be considered by normalizing ratings with a function f (x): ( ) where r max and r min represent the maximum and minimum ratings, respectively. f (x) values can be fell in the [ , ] interval. meanwhile, we use the logistic function g(x) = /( + e −x ) to limit the predicted ratingsr ij within the range of [ , ]. based on this, the task of social recommendation is likewise to minimize the predictive error. hence, the objective function of the evolving relation embedding recommendation algorithm is formalized as: where s(u i ) = {k|s ik ≥ } is the set of most intimate friends of user u i , and the parameter is the threshold of the close relation value. we adopt stochastic gradient descent (sgd) to solve the local minimum solution of l, and learn the latent feature vectors u i and v j . the partial derivatives of the objective function l with respect to u i and v j are computed as: where g (x) = e −x /( + e −x ) is the derivative of the logistic function g(x). in this section, we first describe experimental datasets and metrics. we then present the baselines and the experiments settings. finally, we give the experimental results and analyze them. to evaluate the proposed model, we use two real-world datasets for this task: weibo and last.fm. weibo dataset . the data is collected from sina weibo, which is the most popular microblogging platform in china. it includes basic information about messages (time, user id, message id etc.), mentions (user ids appearing in messages), forwarding paths, and whether containing embedded urls or event keywords. in addition, it also contains a snapshot of the following network of users (based on user ids). last.fm dataset . this dataset has been obtained from last.fm online music system. its users are interconnected in a social network generated from last.fm "friend" relations. each user has a list of most listened music artists, tag assignments, i.e. tuples [user, tag, artist] , and friend relations within the dataset social network. each artist has a last.fm url and a picture url. for two datasets, the user-user relations are constructed from following or bi-directional friendships between social network users, user-item relations are constructed from the user posting or listening behavior. the statistics of two datasets are summerized in table . we use the mean absolute error (mae), root mean square error (rmse) and the average precision of top-k recommendation (average p@k) to evaluate the performance of recommendation algorithms. according to their definition, a smaller mae/rmse or bigger average p@k value means better performance. for each dataset, { %, %} are selected randomly as training set and the rest as the test set. we will repeat the experiments times and report the average performance. in order to evaluate the effectiveness of our proposed recommendation algorithm, we select following recommendation algorithms as comparison methods: -pmf [ ] : the method adopts a probabilistic linear model with gaussian distribution, and the recommendations are obtained only by relying on the rating matrix of users to items. -sorec [ ] : the method integrates social network structure and the useritem rating matrix based on probabilistic matrix factorization. however, the algorithm ignore the temporal changes of relations between users. -rste [ ] : the model fuses the users' tastes and their trusted friends' favors together for the final predicted ratings. similarly, the method doesn't consider the changes of trust relations over time. -socialmf [ ] : the model integrates a trust propagation mechanism into pmf to improve the recommendation accuracy. however, the algorithm represents the feature vector of each user only by the feature vectors of his direct neighbors in the social network. -trustmf [ ] : the model proposes social collaborative filtering recommendations by integrating sparse rating data and social trust network. the algorithm can map users into low-dimensional truster feature space and trustee feature space, respectively. -sodimrec [ ] : the model adopts simultaneously the heterogeneity of social relations and weak dependency connections in the social network, and employs social dimensions to model social recommendation. the optimal experimental settings for each method were either determined by our experiments or were taken from the suggestions by previous works. the setting that were taken from previous works include: the learning rate η = . ; and the dimension of the latent vectors d = . all the regularization parameters for the latent vectors were set to be the same at . . comparisons of recommendation model. we use different amounts of training data ( %, %) to test the algorithms. comparison results are demonstrated in table , and we make the following observations: ( ) our proposed approach soerec always outperforms baseline methods on both mae and rmse. the major reason is that the proposed framework exploits heterogeneity of social relations via time dimension and network embedding technique. ( ) recommendation systems by exploiting social relations all perform better than the pmf method only by using user-item rating matrix in terms of both mae and rmse. ( ) among these relation-aware recommendation methods, leveraging more indirect relations method generally achieves better performance than only using direct connections methods. in a word, social relations play an important role in context-aware recommendations. figure summarizes the user recommendation performance for the state-of-the-art methods and the proposed model. generally speaking, it can be shown from the figure that the average p@k value decreases gradually along with the increasing number of k. besides, we can also observe on both datasets that: firstly, the proposed method consistently perform better than baseline methods, indicating that the considering cross-time evolving graph embedding by soerec model can be recommended the more appropriate users than recommendation models without considering time dimension. secondly, trust-based algorithms (trustmf, socialmf and rste) consistently perform better than non-trust based benchmarks (socrec, pmf). it is because trust-based algorithms can fully exploit the network structure, which tackles the incomplete, sparse and noisy problem. finally, among the different recommendation methods, considering heterogeneous network (socdimrec and soerec) significantly performs better than the other methods. in this paper, we propose a novel social recommendation model by incorporating cross-time heterogeneity network of relations. we construct an evolving heterogeneous relation network with timestamp information based on multiple objects and links types. the evolving graph can learn more accurate user relations. we then use network embedding technique to encode the latent feature spaces of relations into the objective function. to demonstrate the effective of the proposed model, we construct extensive experiments. the experimental results reveal that our proposed method outperforms the state-of-the-art baseline methods. toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions item-based top-n recommendation algorithms scalable recommendation with hierarchical poisson factorization node vec: scalable feature learning for networks a matrix factorization technique with trust propagation for recommendation in social networks social contextual recommendation semi-supervised classification with graph convolutional networks amazon. com recommendations: item-to-item collaborative filtering eigenrank: a ranking-oriented approach to collaborative filtering learning to recommend with social trust ensemble sorec: social recommendation using probabilistic matrix factorization recommender systems with social regularization probabilistic matrix factorization asymmetric transitivity preserving graph embedding probabilistic matrix factorization item-based collaborative filtering recommendation algorithms line: large-scale information network embedding mtrust: discerning multi-faceted trust in a connected world recommendation with social dimensions cane: context-aware network embedding for relation modeling community-enhanced network representation learning for network analysis max-margin deepwalk: discriminative learning of network representation transnet: translation-based network representation learning for social relation extraction structural deep network embedding collaborative filtering with social exposure: a modular approach to social recommendation social recommendation with strong and weak ties social collaborative filtering by trust network representation learning with rich text information overlapping community detection at scale: a nonnegative matrix factorization approach key: cord- -w agcu authors: lago, andré sousa; dias, joão pedro; ferreira, hugo sereno title: conversational interface for managing non-trivial internet-of-things systems date: - - journal: computational science - iccs doi: . / - - - - _ sha: doc_id: cord_uid: w agcu internet-of-things has reshaped the way people interact with their surroundings. in a smart home, controlling the lights is as simple as speaking to a conversational assistant since everything is now internet-connected. but despite their pervasiveness, most of the existent iot systems provide limited out-of-the-box customization capabilities. several solutions try to attain this issue leveraging end-user programming features that allow users to define rules to their systems, at the cost of discarding the easiness of voice interaction. however, as the number of devices increases, along with the number of household members, the complexity of managing such systems becomes a problem, including finding out why something has happened. in this work we present jarvis, a conversational interface to manage iot systems that attempts to address these issues by allowing users to specify time-based rules, use contextual awareness for more natural interactions, provide event management and support causality queries. a proof-of-concept was used to carry out a quasi-experiment with non-technical participants that provides evidence that such approach is intuitive enough to be used by common end-users. the internet of things (iot) is usually defined as the networked connection of everyday objects with actuating and sensing capabilities, often equipped with a collective sense of intelligence [ ] . the integration of such objects creates a vast array of distributed systems that can interact with both the environment and the human beings around them, in a lot of different ways [ ] . this flexibility of iot systems has enabled their use across many different product areas and markets, including smart homes, smart cities, healthcare, transportation, retail, wearables, agriculture and industry [ ] . still, one of the most visible application of iot are customized smart spaces, such as smart homes as the current technology make it possible for consumers to create a customized iot experience based on off-the-shelf products [ ] . the initial popularity of devices such as single-board computers and low-cost micro-controllers, followed by widespread cloud-based solutions controlled by mobile phones, it is now commonplace to remotely interact with a myriad of devices to perform automated tasks such as turning the lights on and opening the garage door just before one arrives home [ , ] . but as the number of devices and interactions grows, so does the management needs of the system as a whole, as it becomes essential to understand and modify the way they (co)operate. in the literature this capability commonly known as end-user programming [ ] , and once we discard trained system integrators and developers, two common approaches emerge: visual programming tools and conversational assistants [ ] . visual programming solutions are usually deployed as centralized orchestrators, with access to the devices and components that comprise such systems. these platforms range from simplified if-then rules (e.g. ifttt ) to exhaustive graphical interfaces (e.g. node-red ) through which one can visualize, configure and customize the devices and systems' behaviour [ , , ] . most visual approaches attempt to offer integration with third-party components (e.g., google calendar), so that their services can be used as part of the system's behavioural rules. these solutions, however, possess some disadvantages for non-technical endusers. consider a node-red system orchestrating an user's smart home with multiple devices. even in situations where there are only a couple of dozen rules defined, it can be challenging to understand why a specific event took place due to the overwhelming data flow that results from these. furthermore, just one dozen rules can already lead to a system not possible to visualize in a single screen [ ] . the more rules one adds, the harder it becomes to conceptually grasp what the system can do. part of the reason is because they are built to be imperative, not informative; current solutions mostly lack in meta-facilities that enable the user or the system to query itself [ ] . another common, and sometimes complementary, alternative to visual programming, is the many conversational assistants in the market, such as google assistant, alexa, siri and cortana, that are capable of answering natural language questions and which recently gained the ability to interact with iot devices (see [ ] and [ ] for a comparison of these tools). amongst the most common features they provide is allowing direct interaction with sensing and actuating devices, which enables the end-user to talk to their light bulbs, thermostats, sound systems, and even third-party services. the problem with these solutions is that they are mostly comprised of simple commands and queries directly to the smart devices (e.g. is the baby monitor on? ", "what is the temperature in the living room? ", or "turn on the coffee machine". these limitations mean that although these assistants do provide a comfortable interaction with devices, a huge gap is easily observable regarding their capabilities on managing a system as a whole and allowing the definition of rules for how these smart spaces oper-ate. even simple rules like "close the windows everyday at pm" or "turn on the porch light whenever it rains" are currently not possible, unless one manually defines every single one of them as a capability via a non-conversational mechanism. furthermore, most assistant are deliberately locked to specific vendor devices, thus limiting the overall experience and integration. one can conclude that although current smart assistants can be beneficial and comfortable to use, they do not yet have the complexity and completeness that other systems like node-red. meanwhile, visual programming environments are still far too technical for the common end user. in this paper, we propose a system that tackles the problem of managing iot systems in a conversational approach, towards shortening the existing feature gap between assistants and visual programming environments. parts of this work are from [ ] master's thesis. the rest of this document is structured as follows: sect. provides a summary of related works which identify open research challenges; in sect. we propose our approach to support complex queries in conversational assistants, which implementation details are further presented in sect. ; we proceed in sect. to evaluate our approach using simulated scenarios and experimental studies. finally, sect. drafts some closing remarks. there exists some work in this area that recognize the problem of controlling and managing iot infrastructures by an end-user via a several approaches. kodali et al. [ ] present an home automation system to "increase the comfort and quality of life", by developing an android app that is able to control and monitor home appliances using mqtt, node-red, ifttt, mongoose os and google assistant. their limitations lie in that the flows must first have been first created in node-red, and the conversational interface is used just to trigger them, ignoring all the management activities. austerjost et al. [ ] recognized the usefulness of voice assistants in home automation and developed a system that targets laboratories. possible applications reported in their paper include stepwise reading of standard operating procedures and recipes, recitation of chemical substance or reaction parameters to a control, and readout of laboratory devices and sensors. as with the other works presented, their voice user interface only allows controlling devices and reading out specific device data. he et al. [ ] , concludes that, even with conversational assistants, most of iot systems have usability issues when faced with complex situations. as example, the complexity of managing devices schedules rises with the number of devices and the common conflicting preferences of household members. nonetheless, as concluded by ammari et al. [ ] , controlling iot devices is one of the most common uses of such assistants. agadakos et al. [ ] focus on the challenge of understanding the causes and effects of an action to infer a potential sequence. their work is based on a mapping the iot system' devices and potential interactions, measuring expected behaviours with traffic analysis and side-channel information (e.g. power) and detecting causality by matching the mapping with the collected operational data. this approach would potentially allow the end user to ask why is something happening, at the cost of modified hardware and a convoluted side-channel analysis. they did not attempted to port their findings into a conversational approach. braines et al. [ ] present an approach based on controlled natural language (cnl) -natural language using only a restricted set of grammar rules and vocabulary -to control a smart home. their solution supports ( ) direct question/answer exchanges, ( ) questions that require a rationale as response such as "why is the room cold?" and ( ) explicit requests to change a particular state. the most novel part of their solution is in trying to answer questions that require a rational response, however they depend on a pre-defined smart home model that maps all the possible causes to effects. from the above analysis, the authors were not able to found any solution that would simultaneously provide: ( ) a non-trivial management of an iot system, ( ) be comfortable and easy to use by a non-technical audience, and ( ) allow the user to better understand how the system is functioning. by non-trivial we mean that it should be possible to define new rules and modify them via a conversational approach, achieving a de facto integration of multiple devices; not just directly interacting with its basic capabilities. the comfort would be for the user not to have to move or touch a device to get his tasks done (i.e. using voice), or edit a node-red visual flow. as to understanding their system's functioning, we mean the ability to grasp how and why something is happening in their smart space. this last point, combined with the other two, would ideally allow someone to simply ask why something happens. we propose the development of a conversational bot dedicated to the management of iot systems that is capable of defining and managing complex system rules. our prototype is called jarvis, and is available as a reproducible package [ ] . jarvis's abilities reach across different levels of operational complexity, ranging from direct one-time actions (e.g. turn on the light) to repeating conditional actions (e.g. when it is raining, close the windows). jarvis also lets the user easily understand and modify the rules and cooperation of multiple devices in the system, through queries like why did the toaster turn on? in these cases, we incorporated jarvis with conversational awareness to allow for chained commands; the following dialogue exemplifies this particular feature: user: "why did the toaster turn on? " jarvis: "you told me to turn it on at am." user: "okay, change it to : am." jarvis: "sure, toaster timer was changed." ... the reader would note that the second user's query would not make sense on its own. we believe that such features improve the user's experience since it avoids repeating information that has already been mentioned in the conversation, and presents a more natural (conversational) interaction. to ease the integration with nowadays systems and provide us an experimental reproducible environment, we integrated the interface with existing platforms such as google assistant and slack , amongst others. we made sure to provide the ability for jarvis to interact both via voice and text. to develop the conversational interface, we decided to opt for dialogflow as this platform provides built-in integration with multiple popular frontends and there exists extensive documentation for this purpose [ ] . in this case, we used ( ) the slack team-communication tool, and ( ) google assistant, so that both text and voice interfaces were covered. in the case of google assistant, the user may use any supported device paired with their account to communicate with jarvis, following a known query prefix such as "hey google, talk to jarvis". regardless of which type of interface is used, the result is converted to strings representing the exact user query and subsequently sent to dialogflow's backend (thus overcoming potential challenges due to speech recognition), which are then analyzed using natural language processing (nlp) techniques. advancement of the existent nlp techniques made available by dialogflow falls out-of-the-scope of this work. upon receiving a request, dialogflow can either produce an automatic response or send the parsed request to a fulfillment backend. this component is thus responsible for parsing the incoming strings into a machine understandable format (json). there are a few key concepts that are leveraged in our implementation: entity. things that exist in a specific iot ecosystem can be represented by different literal strings; for example, an entity identified by toggleable-device may be represented by "living room light" or "kitchen light". additionally, entities may be represented by other entities. dialogflow use of the @ symbol (i.e. @device) for entities, and provides some system's defaults; intent. an intent represents certain type of user interaction. for instance, an intent named turn on/off device may be represented by turn the @device on and turn the @device off. for a request such as "turn the kitchen light on", dialogflow understands that @device corresponds to kitchen light and provides that data to the fulfillment backend; context. contexts allow intents to depend on previous requests, enabling the creation of context-aware interactions. these are what supports queries such as "cancel that" or "change it to am". multiple intents, entities and contexts were defined in jarvis and the main ones are illustrated in fig. . here we provide in detail one of its intents: usage creates an action that is performed upon a certain event, such as an activity of another device or a change of a device's status. definition @action:action when @event:event example turn the bedroom light on when the living room light turns off with the above definitions, this component takes requests and builds the corresponding objects containing all actionable information to be sent to the jarvis backend for further processing. for each of the defined intents, this component has an equivalent class responsible for (a) parsing the request, (b) validating its request parameters (e.g. device name or desired action), and (c) generating an appropriate response. an overview is provided in fig. . should the request contain errors, an explanatory response is returned. when all the parameters are considered valid, but the intended device is unclear (e.g. user wants to turn on the light, but there is more than one light), the generated response specifically asks the user for further clarification in order to gain context. to tackle cancellation intents, we model all actions using the command design pattern, thus providing both a straightforward execute and undo mechanism, as well as an history of performed actions. for most intents, such as direct actions or "why did something happen?" queries, the effects are immediate. however, period actions, events and causality queries require a different design approach so that they can perform actions on the backend without the need for a request to trigger them. a period action is an intent be carried and then undone after a certain period (e.g. "turn on the light from pm to pm"). in these scenarios, we generate a state machine to differentiate between all the different action status, such as (a) nothing has executed yet (before pm), (b) only the first action was executed (after but before pm), and (c) both have been executed (after pm). we use a combination of schedulers and threads to guarantee proper action, and abstract all these details inside the command pattern. the same strategy applies for rules such as "turn on the light every day at pm", with the appropriate state machine and scheduler modifications. this mechanism is (obviously) different for events that are the result of intentions such as "turn on the kitchen light when the presence sensor is activated". here, we leverage a publish-subscribe approach by orchestrating multiple unique and identifiable message queues for the devices' actions and state transitions. upon startup of the backend, we create class listeners that subscribe the corresponding event queues of available devices, and callback the jarvis backend when appropriate. this orchestration management is dynamic, and depends on the specific rules that are defined. in the aforementioned intent, we would add an observer to look for messages on the presence sensor's event queue with the value on. causality queries. this relate to the user asking why something happened (e.g. "why did the light turn on? "). to implement them, we augment each command to determine whether it can cause a specific condition to be true. this per se solves some scenarios where the answer can be found by looking at the history of executed commands (e.g. "because you asked me to turn it on at : pm"). however, there might exist multiple rules may have cause the condition to be true, in which case it is not enough to blame the latest logged command. instead, there are two possible approaches: (a) return the earliest possible cause, or (b) use a heuristic to infer the most relevant one. another non-trivial scenario is where the explanation is due to a chain of interconnected rules. here, it seems that one can (a) reply with the complete chain of events, (b) reply with the latest possible cause, or (c) engage in a conversation through which the user can explore the full chain of events as they deem adequate (e.g. "tell me more about things that are triggered by rain"). in this work, we opted to use the earliest possible cause for the first scenario, and the latest for the second; more complex alternatives can be found in [ , ] . in order to understand how jarvis compares to other systems, we established a baseline based on ( ) a visual programming language, and ( ) a conversational interface. node-red was picked amongst the available visual programming solution, as it is one of the most popular visual programming solutions [ ] . it follows a flow-based programming paradigm, providing its users with a web-based application through which they can manage rules via connections between nodes that represent devices, events and actions [ ] . google assistant was selected for the conversational interface due to its naturality . there are plenty of ways users can interact with it: (a) the standalone google apps, (b) built-in integration with android and chrome os, or (c) with standalone hardware such as the google home. we compare to this baseline according to two criteria: ( ) the number of different features, and ( ) their user experience in terms of easiness of usage and intuitiveness. for the first, we created a list of simulated scenarios to assess the ability to manage iot systems. we then performed a (quasi-)controlled experiment with users to assess the second criteria. table summarizes the comparison of our prototype to the chosen baseline. it is important to clarify that one-time action w/ unclear device refers to actions like "turn on the light" with which jarvis asks the user to clarify which device he means based through responses such as "do you mean the bedroom or living room light?". a cancel last command refers to the ability of undoing the last action or rule creation by specifically saying that. finally, rules defined for device refers to the user performing queries that require introspection, such as "what rules are defined for the bedroom light?" it is easy to observe that our prototype provides several features that are not present in either the google assistant or node-red. both of these products do a lot more than these features, but especially with the assistant, the advantage is clear since the only kind of iot feature it supports is the one-time action. our second conclusion is that it is possible to bring some of the features currently available in visual programming environments to a conversational interface; the converse (how to bring conversational features to node-red), eludes the authors. we performed a (quasi-)controlled experiment with participants, to gain insight into how end users responded to a conversational approach. our sample includes mostly participants without formal technological skills ( ) , with ages ranging from to . we made sure that (a) all were familiar with basic technologies, though, such as basic usage of smartphones and the internet, and (b) that even non-native english participants had adequate speaking and understanding skills. methodology. each participant was given tasks ( control task and study tasks) to be performed with the help of jarvis, using google assistant as the system interface. as an example, this was one of the sets of tasks given to participants within a scenario with a living room light, bedroom light and a living room motion sensor : the only instructions given were that they should talk to the phone in a way that feels the most natural to them to complete the task at hand. besides the tasks, participants were also given the list of iot devices available in the simulated smart house that they would be attempting to manage through. as a way of increasing the diversity and reducing the bias of the study, we created two different sets of tasks that participants were assigned to randomly. each set also had different devices, with different smart house topologies. the participants were assigned to one of the test sets randomly. variable identification. for each of the tasks, we collected ( ) if the participant was able to complete it, ( ) the time taken to complete, and ( ) the number of unsuccessful queries. this count was made separately for (a) queries that were not understood by the assistant's speech recognition capabilities (e.g. microphone malfunction, background noise), (b) queries where the user missed the intention or made a syntactic/semantic error (e.g. "turn up the lighting"), and (c) valid queries that an human could interpret, but that jarvis was unable to. subjective perception. after completing the tasks, we introduced a nonconversational alternative (node-red), explaining how all tasks could have been performed using that tool. we inquired the participants whether they perceived any advantages of jarvis over such a tool, and whether they would prefer jarvis over non-conversational tools. finally the participants were asked if they had any suggestions to improve jarvis and the way it handles system management. results. table compiles the results observed during the study, each row representing a task given to the participant. each column means: -task: number of the task ( - ) and the task set number in parenthesis ( / ); -done: percentage of participants that completed the task successfully; -time: time in seconds that participants took to complete the task; -iq (ast): number of occurrences that queries were incorrect due to the google assistant not properly recognizing the user's speech; -iq (user): number of occurrences that queries were incorrect due to the user not speaking a valid query; -iq (jvs): number of occurrences that queries were incorrect due to jarvis not recognizing a valid query; -iq: total invalid queries, i.e. sum of iq (ast), iq (user) and iq (jvs). discussion. the complexity of the queries increases from task to task since the queries require more words or interactions. this is reflected by the corresponding increase in time in both task sets. the numbers related to incorrect queries show some occurrences at the (voice) assistant level, which means the speech recognition failed to correctly translate what the participants said. although this does not have implications on the evaluation of jarvis, it does indicate that this sort of systems might be harder to use due if they are not truly multilingual. directly comparing the time needed to complete a task to what would be needed to perform it in a visual programming language is meaningless; either the task is not defined, and that would require orders of magnitude longer than what we observe here, or the task is defined and the times will be obviously similar. similarly, we also observe a few instances of incorrect queries due to grammar mistakes or semantically meaningless, cf. iq (user), and therefore did not match the sample queries defined in dialogflow. nevertheless, there where grammatically incorrect user queries such as "turn on lights" but which still carries enough information to understand what the user's intent is. we consider more serious the number of valid sentences that were considered incorrect queries by jarvis, cf. iq (jvs). these could have been caused by either a mispronunciation of a device's name, or a sentence structure that is unrecognizable by the dialogflow configuration. this possibly represents the most serious threat to our proposal, to which we will later dedicate some thoughts on how to mitigate it. nonetheless, the success rate of all tasks is very high (always greater than %), which provides evidence that the system might be intuitive enough to be used without previous instruction or formation. these points were reflected by the participants' subjective perception, were they claimed jarvis to be easy to use, intuitive, and comfortable; ultimately, these would be the deciding factors for end-users to prefer jarvis over a non-conversational interface. an additional observation pertaining jarvis' answers, particularly those regarding causality queries, were state by some users, where they claimed that if the provided response was too long, it would become harder to understand it due to the sheer increase of conveyed information. a possible solution for this problem would be to use a hybrid interface that provides both visual and audio interactions, but there could be other approaches such as an interactive dialogue that shortens the sentences. threats to validity. empirical methods seem to be one of the most appropriate techniques for assessing our approach (as it involves the analysis of humancomputer interaction), but it is not without liabilities that might limit the extent to which we can assess our goals. we identify the following threats: ( ) natural language capabilities, where queries like "enable the lights" might not be very common or semantically correct, but it still carries enough information so that a human would understand its intention. the same happen with device identification, such as when the user says turn on the bedroom lights, and the query fails due to the usage of the plural form. during our study, we observed many different valid queries that did not worked due to them not being covered by the dialogflow configuration; ( ) coverage error, which refers to the mismatch between the target population and the frame population. in this scenario, our target population was (non-technical) end-users, while the frame population was all users that volunteered to participate; and ( ) sampling errors are also possible, given that our sample is a small subset of the target population. repeating the experience would necessarily cover a different sample population, and likely attain different results. we mitigate these threats by providing a reproducible package [ ] so other researchers can perform their own validation. in this paper we presented a conversational interface prototype able to carry several different management tasks currently not supported by voice assistants, with capabilities that include: ( ) delayed, periodic and repeating actions, enabling users to perform queries such as "turn on the light in min" and "turn on the light every day at am"; ( ) the usage of contextual awareness for more natural conversations, allowing interactions that last for multiple sentences and provide a more intuitive conversation, e.g. "what rules do i have defined for the living room light?"; ( ) event management, that allows orchestration of multiples devices that might not necessarily know that each other exists, e.g. "turn on the light when the motion sensor is activated"; and ( ) causality queries, to better understand how the current system operates, e.g. "why did the light turn on?" we conducted (quasi-)controlled experiments with participants that were asked to perform certain tasks with our system. the overall high success rate shows that the system is intuitive enough to be used by people without significant technological knowledge. it also shows that most challenges lie in the natural language capabilities of the system, as it is hard to predict them any user queries that have the same intrinsic meaning. we thus conclude that incorporating recent nlp advances (that were beyond the scope of this paper) would have an high impact in terms of making it more flexible to the many different ways (correct or incorrect) that users articulate the same intentions. nonetheless, by doing a feature comparison, we can observe that jarvis was able to implement many features that current conversational assistants are lacking, while simultaneously being more user-friendly than the available alternatives to iot management (such as visual programming approaches). as future work, we believe that our approach could be improved by sometimes engaging in a longer (but fragmented) conversation with the user, particularly when providing causality explanations. this would allow the user to understand more information at his own pace, but also because it would enable them to make changes to the rules as the conversation unfolds. butterfly effect: causality from chaos in the iot music, search, and iot: how people (really) use voice assistants introducing a virtual assistant to the lab: a voice user interface for the intuitive control of laboratory instruments conversational homes: a uniform natural language approach for collaboration among humans and devices a reactive and model-based approach for developing internet-of-things systems meta-design: a manifesto for end-user development end-user development fog at the edge: experiences building an edge computing platform when smart devices are stupid: negative experiences using home smart devices hands-on chatbots and conversational ui development: build chatbots and voice user interfaces with chatfuel, dialogflow, microsoft bot framework, twilio, and alexa skills visual dataflow modelling -some thoughts on complexity low cost smart home automation system using smart phone exploring complex event management in smart-spaces through a conversation-based approach andrelago /jarvis: initial release alexa vs. siri vs. cortana vs. google assistant: a comparison of speech-based natural user interfaces an iot-based user-centric ecosystem for heterogeneous smart home environments from the internet of things to the internet of people conversational interface challenges. developing conversational interfaces for ios from iot mashups to model-based iot a survey on visual programming languages in internet of things internet of things strategies for the integration of smart technologies into buildings and construction assemblies key: cord- - p bf ik authors: lai, lucinda; sato, rintaro; he, shuhan; ouchi, kei; leiter, richard; thomas, jane delima; lawton, andrew; landman, adam b.; zhang, haipeng mark title: usage patterns of a web-based palliative care content platform (pallicovid) during the covid- pandemic date: - - journal: j pain symptom manage doi: . /j.jpainsymman. . . sha: doc_id: cord_uid: p bf ik background: the covid- pandemic has highlighted the essential role of palliative care to support the delivery of compassionate, goal-concordant patient care. we created the web-based application, pallicovid (https://pallicovid.app/), in april to provide all clinicians with convenient access to palliative care resources and support. pallicovid features evidence-based clinical guidelines, educational content, and institutional protocols related to palliative care for covid- patients. it is a publicly available resource accessible from any mobile device or desktop computer that provides clinicians with access to palliative care guidance across a variety of care settings, including the emergency department, hospital ward, intensive care unit, and primary care practice. objective: the primary objective of this study was to evaluate usage patterns of pallicovid to understand user behavior in relation to this palliative care content platform during the period of the local peak of covid- infection in massachusetts. design: we retrospectively analyzed de-identified usage data collected by google analytics from the first day of pallicovid’s launch on april , until may , , the time period that encompassed the local peak of the covid- surge in massachusetts. measure: ments: user access data was collected and summarized by google analytics software that had been integrated into the pallicovid web application. results: , users accessed pallicovid and viewed , pages from april to may , . users spent an average of minutes and seconds per session. % of users were first-time visitors, while the remaining % were return visitors. the majority of users accessed pallicovid from the united states ( %), with a large proportion of users coming from boston and the surrounding cities ( % of overall users). conclusions: pallicovid is one example of a scalable digital health solution that can bring palliative care resources to frontline clinicians. analysis of pallicovid usage patterns has the potential to inform the improvement of the platform to better meet the needs of its user base and guide future dissemination strategies. the quantitative data presented here, although informative about user behavior, should be supplemented with future qualitative research to further define the impact of this tool and extend our ability to deliver clinical care that is compassionate, rational, and well-aligned with patients’ values and goals. the novel coronavirus (covid- ) pandemic has strained the healthcare system beyond its usual capacity to accommodate such a large influx of seriously ill patients. , this has created the need to bring palliative care medicine practices to the frontline across specialties. many clinicians who have not been specially trained in palliative care do not feel adequately prepared to facilitate difficult conversations with patients or their family members about goals of care. providing these clinicians with convenient access to focused education, clinical reference materials, and specialist palliative care support may improve end of life care for patients. as the covid- surge approached boston in april , our interdisciplinary team of emergency physicians, palliative care specialists, and digital health innovators at partners healthcare recognized the need to rapidly increase the capacity to deliver primary palliative care at the frontline. partners healthcare is a large academic integrated healthcare system that was founded by massachusetts general hospital and brigham and women's hospital in , and now consists of a core network of eleven hospitals in the new england area. our goal was to create a centralized resource of information that would support partners healthcare staff to provide goal-concordant care and expert symptom management to patients who were ill with suspected covid- infection. given the challenges of distributing and maintaining the quality of up-to-date information in such a fast-changing environment, we sought out a digital solution. we aimed to develop an online, centralized compendium of clinical reference materials, focused educational content, and institutional protocols intended to seamlessly integrate palliative care across a variety of care settings, including the emergency department, hospital ward, intensive care unit, and primary care practice with respect to partners-specific resources and established protocols. we built a progressive web application (pwa) called "pallicovid" for the purpose of sharing digital content related to the delivery of palliative care during the covid- pandemic. pwas have previously been noted to have significant untapped potential in the field of healthcare to improve work efficiency and quality of care. the main advantages of pwas is that they provide the user with a similar experience to using a native application, but are accessed via a web browser, do not require download from an app store, can be shared via a url link, and can be found via a web search engine. users around the world can access pallicovid via web browser on any mobile or desktop device, thus allowing for wide dissemination of the digital content posted by the authors. at the same time, the authors are able to centrally manage the content and make instantaneous updates as needed, which is an important mechanism for quality control considering the rapidly changing information environment of the pandemic. the clinical guidelines and content featured on pallicovid were sourced, produced, and reviewed by palliative care specialists at brigham and women's hospital and dana-farber cancer institute and emergency physicians from brigham and women's hospital. this interdisciplinary team of nurses, physicians, and physicians assistants met weekly to discuss updates to the content. we prioritized the publication of guidelines that were succinct, specific, and commensurate with the best available scientific evidence about the covid- disease process and end of life care. we established several key criteria for content posted on pallicovid: • accurate: content was reviewed by palliative care experts to reflect the best available scientific evidence • practical: recommendations were designed to be useful and implementable by nonpalliative care clinicians in a variety of care settings • accessible: content was presented in a format that was optimized for viewing on both mobile devices and desktop computer screens • applicable: content was specific to the care of patients with confirmed or suspected covid- infection and took into account the need to limit face-to-face interactions due to enhanced infection control measures and restricted visitor policies we have included examples of pallicovid resources that respectively address the three major domains of palliative care expertise with regards to symptom management, goals of care discussions, and family support: , opioid dosing recommendations for the treatment of dyspnea and pain at the end of life (figure ), a conversation guide for rapid code status determination in the peri-intubation setting (figure ), and a tip sheet for responding to difficult questions from family members of dying patients ( figure ). although certain features of pallicovid are restricted to users within the partners healthcare system, such as one-click access to the hospital paging system, we have made the majority of the content on pallicovid open to public access. the primary objective of this study was to collect and analyze usage data from pallicovid as a way to better understand user behavior and gain insights about the population of users accessing this palliative care content platform. the secondary objective was to use the insights derived from user engagement behavior to formulate ideas about future improvements to the platform and future dissemination strategies. we retrospectively analyzed de-identified usage data that was collected by google analytics from the first day of pallicovid's launch on april , through may , . google analytics is a free tool that provides quantitative data on website usage and has previously been used in health research for process evaluation and quality improvement. this project was undertaken as a quality improvement initiative at brigham and women's hospital and as such was not formally supervised by the institutional review board per their policies. we followed the squire guidelines for quality improvement reporting. evaluating usage we evaluated usage data using google analytics (google, llc, mountain view, ca), which was installed on pallicovid and used to track user data from april , through may , ). this allowed collection of data related to user behavior emanating from a user's interaction with the application. the data could come from avenues such as the urls of the pages the user viewed, the location of the user, and the type of device being used to access the application. google analytics also collected information about the nature of the visit such as the content viewed and length of the session. google analytics did not collect any personally identifiable information and presented all collected data in aggregate form, thus mitigating the ethical concerns that could arise from user behavior research. , we evaluated user engagement by examining "sessions". a session was defined as a series of interactions by a user that took place within a predetermined time frame ( minutes), which represented the period of time that the user was actively engaged with the application. the number of returning users referred to the number of sessions visited by the same client id. a relatively high proportion of returning users has been noted in web-based mental health intervention research to be a marker of user engagement. , , the number of pages per session refers to the number of webpages within the platform that the user viewed in a single session, and the mean session duration referred to the mean duration of time that users spent on the platform (reported in minutes and seconds). this kind of user traffic data provides an approximation of the degree of exposure that users had to the content being hosted on the platform. graphics demonstrating the geographic distribution of users were created by google analytics. tables demonstrating the distribution of sessions, visits, most viewed pages, and device types were created using microsoft excel, version . (microsoft corporation, seattle, wa). there was a total of , users and , page views during the period of april , to may , . % of users were first-time visitors to the application, with the remaining % representing return visitors ( figure ). % of sessions were accessed by first-time or secondtime visitors to the application (table ) . users spent an average of minutes and seconds per session on pallicovid, with % of sessions lasting less than one minute (table ). the most viewed page was the home page ( . % of total page views), followed by quick guides to symptom management (an example of which is included as figure ), the page with indexed links to covidprotocols.org, the section on emergency department resources, and the collection of videos simulating different types of difficult conversations (table ) . almost all users used either mobile devices ( %) or desktop computers ( %) to access the application (table ). only % of users accessed pallicovid by tablet device. the majority of users ( %) accessed pallicovid from the united states, with % of all users located in boston and the surrounding cities. the remainder of users accessed pallicovid from other countries outside of the u.s. in the response to the large influx of seriously ill patients during the peak period of the covid- pandemic in boston, pallicovid was created as a scalable digital solution to increase access to palliative care educational resources and specialist support. analyzing user engagement behavior, we found that three-quarters of sessions ( %) consisted of first-time and second-time visitors to the application. the remaining % of sessions represented usage by "frequent users," defined as those who have visited the application three or more times. a previous study on usage patterns of a mobile palliative care application found that a small subset of users ( % of users) comprised the majority of all activity in the application ( % of activity). one possible explanation for this distribution are frequent users who coming back to the application as a "just in time" clinical reference tool, whereas infrequent users were accessing the application for more passive educational purposes. pallicovid content features both types of information: "just in time" clinical reference material such as medication dosing recommendations, as well as focused educational content such as difficult conversation simulation videos, so this may explain the different distribution of frequent versus infrequent users. furthermore, we limited our data collection period to less than four weeks of operation, which may not be enough time to observe trends in users returning to the application. average duration of sessions on pallicovid was short, at less than minutes per session for mobile users and less than minutes per session for desktop users. some may interpret the short session duration to indicate a degree of user disengagement, but an alternative explanation may be that users were able to quickly find what they were looking for. sessions in which users access the link to the hospital paging directory from the pallicovid home page, for example, would cause the user to exit the application and end the session relatively quickly. in that case, a short session duration would have indicated that pallicovid had successfully carried out a function that it was designed to perform. the platform can be improved based on user behavior. future iterations of the application will focus on improving user engagement with the content pages for "nursing resources" and "hospice care referrals", which did not rank among the top five most-viewed pages. the lower ranking of these pages may be a reflection of the original dissemination strategy, which focused more heavily on reaching physicians and advanced practice providers (app) rather than the nursing staff. future dissemination strategies should specifically target nurses or have product design specifications to specific user populations as well. population and user specific design may be critical as different roles such as nursing spend the majority of their time at patients' bedsides and are more likely to be aware of patients' specific symptoms and end of life needs than perhaps physicians or apps. there was an overall even split in device types between mobile and desktop, arguing against a stronger user preference for one type of device over the other. as we add future content to the platform, we should continue to optimize viewing for both smaller screens and larger screens with vector graphics, responsive web design, and screen size flexibility. when observing the location of the users, % of users were found to be from boston and surrounding cities. this finding aligns with our implementation science focus on engaging users about the platform primarily within the partners institutions. in addition, a subset of the content was partners institution-specific and paging directory access was restricted to members of the partners network. future reorganization of the content into internal-and external-facing sections may make the application more inviting to users outside of the partners enterprise. the majority of the clinical content contained within the guides would be applicable to patient care regardless of the particular institution, so future improvements to the platform may include better sign-posting of content as broadly applicable and not partners specific. although google analytics has provided promising data about the usage patterns of pallicovid, the data should be interpreted cautiously. google analytics collects and presents data with a marketing and e-commerce framework for understanding web-based behavior, rather than a health research frame. aggregation of the data, in addition to potential discrepancies in how users are counted when users delete browser cookies, switch devices, or create a new client id by changing browsers, creates bias in the data. nonetheless, we believe there is still value in tracking overall engagement numbers and observing trends in usage over time in order to continually evaluate and improve the performance of the platform. as the volume of patients with covid- infection rises and falls through different phases of the pandemic, the need for a digital solution such as pallicovid may also fluctuate over time. , because our team focused our efforts on the development of a resource that would be maximally useful to clinicians within the partners healthcare system, the sample size was limited to its distribution within this institution and geographic area. we considered uptake of the resource outside of the partners enterprise to be a secondary benefit. future studies should include a more detailed analysis of longitudinal user data or a mixed method assessment with focus groups, surveys, or in-depth interviews, to supplement google analytics data and allow for a more comprehensive quality improvement study. this study described usage patterns of a web-based palliative care content platform over its first month of operation in april . we have demonstrated that, even in the midst of a global pandemic, it is possible to rapidly design and implement a digital solution in response to an unprecedented healthcare challenge. we have also demonstrated the use of a free, open-access tool such as google analytics to evaluate patterns of user behavior, consequences of the dissemination strategy, and aspects of the platform that may be amenable to future improvements. quantitative data should be combined with qualitative research to provide more accurate interpretations of user behavior. we hope that clinicians who access the content on pallicovid may feel empowered to deliver care that is focused on dignity, symptom control, and avoidance of unnecessarily invasive or non-beneficial interventions whenever possible. goal: make rapid, patient-centered intubation recommendations for patients who may be at high-risk for poor outcomes. after establishing that advance directive does not exist, complete the following steps: in to hours, icu team will discuss with you how (your mother) is responding to treatment. can she still hear us? encourage family (even if by phone) to talk with patient as they normally would. we think so, and many believe that hearing is the last sense people retain. i'd encourage you to talk to her as you normally would. the importance of addressing advance care planning and decisions about do-not-resuscitate orders during novel coronavirus (covid- ) creating a palliative care inpatient response plan for covid -the uw medicine experience what matters most in end-of-life care: perceptions of seriously ill patients and their family members should doctors withhold ventilator? the new york times fair allocation of scarce medical resources in the time of covid- pandemic palliative care: beyond ventilators and saving lives emergency medicine physicians' perspectives of providing palliative care in an emergency department towards pwa in healthcare barriers to access to palliative care. palliative care: research and treatment squire . ( standards for quality improvement reporting excellence) : revised publication guidelines from a detailed consensus process: table a process evaluation of a web-based mental health portal (walkalong) using google analytics evaluating information seeking and use in the changing virtual world: the emerging role of google analytics an objective approach to evaluating an internet-delivered genetics education resource developed for nurses: using google analytics tm to monitor global visitor engagement usage patterns of a mobile palliative care application response and role of palliative care during the covid- pandemic: a national telephone survey of hospices in italy palliative care pandemic pack: a specialist palliative care service response to planning the covid- pandemic key: cord- -r gbw j authors: wang, hao; shen, huawei; cheng, xueqi title: modeling users’ multifaceted interest correlation for social recommendation date: - - journal: advances in knowledge discovery and data mining doi: . / - - - - _ sha: doc_id: cord_uid: r gbw j recommender systems suggest to users the items that are potentially of their interests, by mining users’ feedback data on items. social relations provide an independent source of information about users and can be exploited for improving recommendation performance. most of existing recommendation methods exploit social influence by refining social relations into a scalar indicator to either directly recommend friends’ visited items to users or constrain that friends’ embeddings are similar. however, a scalar indicator cannot express the multifaceted interest correlations between users, since each user’s interest is distributed across multiple dimensions. to address this issue, we propose a new embedding-based framework, which exploits users’ multifaceted interest correlation for social recommendation. we design a dimension-wise attention mechanism to learn a correlation vector to characterize the interest correlation between a pair of friends, capturing the high variation of users’ interest correlation on multiple dimensions. moreover, we use friends’ embeddings to smooth a user’s own embedding with the correlation vector as weights, building the elaborate unstructured social influence between users. experimental results on two real-world datasets demonstrate that modeling users’ multifaceted interest correlations can significantly improve recommendation performance. recommender systems suggest to users the items that are potentially of their interests [ ] by mining users' feedback data on items [ ] . real-world recommender systems often allow users to build social relations [ ] , and such social relations provide an independent source of information about users beyond the feedback information [ ] . social correlation theories [ ] , such as homophily and social influence, indicate that there are correlations between two socially connected users [ ] , which can potentially be used to exploit social relations for improving recommendation accuracy [ ] . many methods have been proposed for social recommendation in recent years, and these methods can be mainly grouped into two categories: ( ) memory-based methods [ , , ] use social relation as an indicator that filters relevant users and directly recommend friends' visited items to a user; ( ) model-based methods [ , , , , , , , ] integrate social relation into factorization methods to constrain that friends share similar interest embeddings. moreover, feedbackbased similarities are utilized to weigh friends' interest relevance in memorybased methods [ ] or embedding coherence in model-based methods [ , ] . in sum, existing methods refine two users' social relation into a scalar indicator to build their interest correlation. however, each user's interest is differently distributed across multiple dimensions, and the consistency in one dimension does not mean consistency in other dimensions. as fig. shows, a user's interest similarities with his friends vary greatly in different categories of items. when the user needs suggestions on one category, he may refer to friends with strong correlations on that category, and suggestions of friends with strong correlations on other categories are not useful. therefore, a global scalar indicator used in existing methods cannot express the multifaceted interest correlations between friends. unfortunately, there exists no explicit evidence to refine social networks into the elaborate correlation, and simply distinguishing items' categories would make the problem of data sparsity even more serious, which is not conducive to the learning of model parameters. in this paper, we propose a new embedding-based social recommendation method (fig. ) . we propose to use a correlation vector, instead of a scalar value, to characterize the interest correlation between each pair of friends, and design a dimension-wise attention mechanism with the social network as input to learn it. the correlation vector has the same dimension with user's embedding, thus can sufficiently capture the high variation in users' interest correlations on each fine-grained dimension. moreover, we smooth a user's embedding by his friends' embeddings, with the correlation vector into consideration. the combination of the dimension-wise attention mechanism and the smoothing operation can impose strong and delicate unstructured correlations on users' embeddings while making interactions between users and items. such an end-to-end framework allow the proposed method to learn the unstructured correlations in a fully data-driven manner. we evaluate the proposed method by extensive experiments on two realworld datasets collected from gowalla and epinions respectively. experimental results show that the proposed method outperforms the state-of-the-art social recommendation methods. recommender systems normally utilize the user-item rating information for recommendation. collaborative filtering [ ] has become one of the most popular technologies, which achieves good recommendation results. social recommender systems leverage the social network information to enhance traditional recommendation methods [ , , , ] . according to the nature of the existing social recommendation techniques, we classify them into two main categories: memory-based methods [ , , ] which normally directly or indirectly recommend users items that their friends like, and model-based methods [ , , [ ] [ ] [ ] , , , ] use users' social relations to constrain that friends share similar embeddings. in sum, existing methods use a scalar value to build friends' interest correlation, which cannot sufficiently express their multifaceted interest correlations. although yang et al. [ ] integrate items' category information to train a matrix factorization model for each category of items, they make data too sparse to learn parameters, and they cannot utilize correlations among different categories. attention mechanism has recently been used in recommendation tasks [ , , ] . for example, sun et al. [ ] use attention to model the dynamic social influence for recommendation. however, they still express users' interest correlation by a scalar value, which cannot sufficiently capture the high variation of users' interest correlation. for ease of description, we first formalize variables used and the problem dealt with in this paper. we denote with u and i the set of users and the set of items respectively. for a user u and an item i, we denote with r ui u's feedback to item i. we use s to represent the social network over users in u . s u represents user u's friend set and s uv = ( ) indicates whether there exists social relation between user u and user v. in our model, we learn a preference vector t u for each user u and a preference vector z i for each item i. item recommendation: given a set of users u with social relations s, a set of items i and u 's feedback over items i, item recommendation recommends for each target user u ∈ u a list of items {i|i ∈ i} consiting of items that the target user is potentially interested in and has not interacted with them up to the recommendation. next, we present the proposed dimension-wise attention model for social recommendation, i.e., dasr in fig. . users' interest is often differently distributed across multiple dimensions. to accurately capture friends' influence on user's preference, one needs to model the multi-dimensional interest correlation between users. fortunately, attention mechanism seems to provide a feasible solution, since it can automatically models and selects pertinent piece of information with the attentive weights from a set of inputs, where higher (lower) weights indicate the corresponding inputs more informative to generate the outputs. to accommodate our problem, we further design a dimension-wise attention mechanism and use it to learn a correlation vector for each pair of friends, building their multi-dimensional interest correlation for social recommendation. figure shows the architecture of dasr, which includes an attention layer, a smoothing layer and a recommendation layer. we learn a preference vector for each user and item, namely, t u and z i , and use dasr to infer a target user u's preference to a candidate item i with social influence of u's friends into consideration. instead of directly performing inner product between t u and z i in the recommendation layer, we first use embeddings of user u's friends to smooth user u's own embedding in the smoothing layer, and the smoothing weights are the correlation vector learned in the attention layer. with these designs, we can learn strong and delicate unstructured correlations of users' embeddings in a fully data-driven manner and provide better item recommendation. we input the embeddings of the target user u and users in his friend set s u to the attention layer, and compute the interest correlation vector between user u and each friend v. we first use a weight matrix w a to perform self-attention on user u and friend v as follows: where t u and t v are embeddings of user u and friend v. t represents transposition and || denotes the concatenation operation. leakyrelu (x) = max( , x) + βmin( , x) acts as non-linear activation function with β as the negative slope. e uv is the attention coefficient that indicates the importance of friend v's features to user u. to ensure that e uv can express the correlation of each dimension in user's embeddings, we design w a as a weight matrix with dimension d * d. for user u, we get an interest correlation matrix c u , and each column of c u represents the correlation vector between user u and one of his friends. to make coefficients easily comparable across different friends, we normalize c u 's each row across all choices of v. we denote with α uv the normalized interest correlation vector between user u and friend v: for each friend v in user u's friend set s u , we obtain a normalized correlation vector α uv to represent dimension-wise interest correlation between user u and user v. we then smooth user u's embedding by adding each friend v's embedding with the correlation vector α uv serving as smoothing weight. where h u is user u's smoothing embedding. is the element-wise hadamard product and σ(z) = +e −z offers nonlinearity. h u consists of both user u's and his friends' embeddings, allowing the smoothing embedding not only to retain user u's own unique interest, but also to integrate his friends' interest. in this way, we can learn different patterns of each user's interest correlation, e.g., some users barely refer to their friends, while some users often refer to a few friends' suggestions, etc. in the recommendation layer, we use user's smoothing embedding, i.e., h u , to make recommendation. denote p ui as user u's preference to item i, and we compute p ui as follows: where z i denotes item i's embedding. we define two types of objective functions according to the feedback type, including implicit feedback, e.g., users' check-in counts at pois, and explicit feedback, e.g., users' rating scores to items. first, we define the objective function in a ranking manner. for each positive feedback (u, i), we randomly select c negative samples from item set i with item i excluded and denote the set of negative samples as neg(i). the objective function is defined as follows: the ranking-based objective function can be applied to both explicit feedback and implicit feedback. second, we define a square error-based objective function for explicit feedback only, in order to predict a user's rating score to an item: where y ui is the user u's true rating score to item i. finally, for user u, we compute his preference to each item in i according to eq. , and take top n items as the recommendation list. we use two real-world datasets collected from gowalla [ ] and epinions [ ] respectively for evaluation. gowalla is a location-based social network (lbsn), and we utilize users' check-in at point-of-interests (pois) and the social network to make poi recommendation [ ] . there are , , check-ins generated by , users over , pois in the gowalla dataset. the total number of users' friendship records is , . epinions is a general consumer review site where users can review items. different from gowalla with a two-way connetions between users, users' social relationship in epinions is their web of trust, which is a one-way connection, like twitter followings. we adapt our model to this different structure, and utilize users' rating histories and trust network to make item recommendation [ ] . the epinions dataset consists of , users who rated a total of , different items at least once. the total number of reviews is , . the total number of issued trust statements is , . for each user u, we partition his feedback set into three parts, i.e., % as training data, % as validation data, and % as testing data. to evaluate the recommendation performance, we use two widely-used metrics on both datasets, namely, precision@n and recall@n, where n is the number of items in the recommendation list. they are computed as follows: where p n u is the set of top n items in user u's recommendation list, and t u is user u's ground truth set of items. |x| denotes the cardinality of set x. for each metric, we consider values (i.e., , , , ) of n in our experiments. for epinions, we also evaluate the prediction on users' explicit rating scores with mae, and it is computed as follows: where y ui is the true ratings given by user u for item i, and abs(·) is the absolute value function. many existing methods are available for poi recommendation, and it is impossible to list all of them as baselines. here, we select the baselines which serve as representative works of memory-based and model-based social recommendation methods. the baselines include: -socf [ ] : socf is a social-based collaborative filtering method, which recommend friends' visited items to users. -soreg [ ] : soreg defines individual-based regularization with pearson correlation coefficient (pcc) in traditional matrix factorization model. the pcc-version regularization achieves the best performance, compared with other variants, as reported in [ ] . -locabal [ ] : locabal takes advantage of both local friends and users with high reputations for social recommendation. -ptpmf [ ] : ptpmf is a probabilistic matrix factorization model that incorporates the distinction of strong and weak ties. -asmf/armf [ ] : asmf and armf argument user-item matrix using friends' visited items as potential items. asmf optimizes a square-loss based matrix factorization model with potential items being assigned a score lower than a user's own visited items. armf optimizes a ranking-based matrix factorization model which assumes that users' preference to different items are: visited items > friends' visited (potential) items > unvisited items. in the experiments, we add a l regularization term to the users' and items' embeddings when performing optimization, and the regularization coefficient is set as . . we set the negative slope β of the leakyrelu function as . . for all latent vectors, we set their dimension as n = . we set the negative count c as . the learning rate decreases from an initial value of . with the increase of iterations, and the decay factor is set as . . fig. present the precision@n and recall@n of all methods in comparison on the gowalla dataset and the epinions dataset respectively. it can be observed that the proposed dasr method achieves the best performance under different settings of n on both datasets and both metrics, which demonstrates the superiority of our method to these state-of-the-art methods. we take fig. as an example to make a detailed discussion. specifically, among these methods, socf is the only memory-based method, which directly recommend friends' items to users. performance of socf is worse than other model-based methods, which learn users' and items' embeddings. soreg integrates social relations as regularization term in matrix factorization model with feedback-based similarities as regularization coefficients, leading to that friends share similar embeddings. it achieves a good result. besides social relations as local context, locabal exploits extra social influence, i.e., users with high reputation as global social context. this makes locabal outperform soreg. ptpmf splits social relations as strong ties and weak ties, and distinguish the different influence of the two types of social ties. we can observe that the differentiation in ptpmf model benefit the recommendation performance. since armf's performance is better than asmf, we present armf only for comparison. it is observed armf is the best baseline method. this may profit from that armf introduce friends' visited items as potential items and it optimizes users' preference to items in a ranking manner. the proposed method, i.e., dasr, learns rich correlation patterns between users' interest by a correlation vector and finally beats these baselines. different from fig. , we can observe that locabal is better than ptpmf on the epinions dataset in fig. . this indicates that weak ties in epinions dataset do not provide valuable suggestions for users. feedback in the epinions dataset is users' rating scores for items. we also present rating prediction results of different methods in comparison, as shown in fig. . note that, we use asmf, rather than armf, since asmf is a square loss-based method and armf focuses on item ranking. it is observed that the proposed dasr achieves the best mae metric on the epinions dataset. by comparing all methods, we can find that the results of rating prediction is similar to those of recommendation results. the difference lies in that armf occupies the second best position in the comparison of recommendation results, while it is slightly better than socf only in the comparison of rating prediction. the reason may be: the value assigned to potential items cannot accurately express users' true preference and factorization on these potential values makes parameter learning deviate from a better direction. we learn an interest correlation vector for each pair of users with social relations. each dimension in the correlation vector represents the interest correlation in the same dimension of users' embeddings. in what follows, we study the interest correlation patterns between different pair of friends. for ease of exhibition, we set the dimension of users' embeddings as and train a new dasr model to get the attention weights between each user and his friends. we select two users from our gowalla dataset, and both them have friends. denote the two users as u and u respectively, we draw the heat map of the weights of attention vectors with their friends. figure shows u 's (left) and u 's (right) attention weights. for each user, we compute the norms of his correlation vectors with friends, and rearrange friend id in the descending order of the norm. we have the following observations: ( ) most dimensions of the attention vector between u and friend are large values, which indicate that they are very similar and we can recommend friend 's visited items to u . ( ) each dimension of the attention vector between u and friend is a small value, which indicates they have no similar interest and recommending friend 's visited items to u cannot achieve a good performance. ( ) by comparing u and u , we can find that each user's interest correlations with his friends have a specific patterns. u may mainly refer to suggestions of several friends with very strong interest correlation, while u may refer to suggestions of each friend dispersedly. these observations demonstrate that the proposed method outperforms baselines, and modeling users' multi-dimensional interest correlation can significantly improve recommendation performance. in this paper, we propose a new embedding-based social recommendation method. we use a correlation vector to characterize the high variation of users' interest correlations on all dimensions, and design a dimension-wise attention mechanism to learn the correlation vector. moreover, we use a user's friends' embeddings to smooth the user's embedding with the correlation vector as weights, and build strong and delicate unstructured social influence. experimental results on two realworld datasets collected from gowalla and epinions respectively demonstrate the superiority of our method to state-of-the-art methods. trust based recommender system for the semantic web modeling users' exposure with social knowledge influence and consumption influence for recommendation personalized recommendation of social software items based on social relations social recommendation with interpersonal influence a matrix factorization technique with trust propagation for recommendation in social networks social recommendation with cross-domain transferable knowledge an automatic weighting scheme for collaborative filtering point-of-interest recommendations: learning potential check-ins from friends learning to recommend with social trust ensemble sorec: social recommendation using probabilistic matrix factorization recommender systems with social regularization trust-aware recommender systems psrec: social recommendation with pseudo ratings trust in recommender systems session-based social recommendation via dynamic graph attention networks attentive recurrent social recommendation exploiting local and global social context for recommendation social recommendation: a review ule: learning user and location embeddings for poi recommendation exploiting poi-specific geographical influence for point-of-interest recommendation joint topic-semantic-aware social recommendation for online voting learning personalized preference of strong and weak ties for social recommendation social recommendation with strong and weak ties social recommendation with optimal limited attention network embedding based recommendation method in social networks npa: neural news recommendation with personalized attention social collaborative filtering by trust circle-based recommendation in online social networks factorization vs. regularization: fusing heterogeneous social relationships in top-n recommendation dual influence embedded social recommendation leveraging social connections to improve personalized ranking for collaborative filtering user preference learning for online social recommendation key: cord- - qys j u authors: zogan, hamad; wang, xianzhi; jameel, shoaib; xu, guandong title: depression detection with multi-modalities using a hybrid deep learning model on social media date: - - journal: nan doi: nan sha: doc_id: cord_uid: qys j u social networks enable people to interact with one another by sharing information, sending messages, making friends, and having discussions, which generates massive amounts of data every day, popularly called as the user-generated content. this data is present in various forms such as images, text, videos, links, and others and reflects user behaviours including their mental states. it is challenging yet promising to automatically detect mental health problems from such data which is short, sparse and sometimes poorly phrased. however, there are efforts to automatically learn patterns using computational models on such user-generated content. while many previous works have largely studied the problem on a small-scale by assuming uni-modality of data which may not give us faithful results, we propose a novel scalable hybrid model that combines bidirectional gated recurrent units (bigrus) and convolutional neural networks to detect depressed users on social media such as twitter-based on multi-modal features. specifically, we encode words in user posts using pre-trained word embeddings and bigrus to capture latent behavioural patterns, long-term dependencies, and correlation across the modalities, including semantic sequence features from the user timelines (posts). the cnn model then helps learn useful features. our experiments show that our model outperforms several popular and strong baseline methods, demonstrating the effectiveness of combining deep learning with multi-modal features. we also show that our model helps improve predictive performance when detecting depression in users who are posting messages publicly on social media. mental illness is a serious issue faced by a large population around the world. in the united states (us) alone, every year, a significant percentage of the adult population is affected by different mental disorders, which include depression mental illness ( . %), anorexia and bulimia nervosa ( . %), and bipolar mental illness ( . %) [ ] . sometimes mental illness has been attributed to the mass shooting in the us [ ] , which has taken numerous innocent lives. one of the common mental health problems is depression that is more dominant than other mental illness conditions worldwide [ ] . the fatality risk of suicides in depressed people is times higher than the general population [ ] . diagnosis of depression is usually a difficult task because depression detection needs a thorough and detailed psychological testing by experienced psychiatrists at an early stage [ ] . moreover, it is very common among people who suffer from depression that they do not visit clinics to ask help from doctors in the early stages of the problem [ ] . however, it is common for people who suffer from mental health problems to often "implicitly" (and sometimes even "explicitly") disclose their feelings and their daily struggles with mental health issues on social media as a way of relief [ , ] . therefore, social media is an excellent resource to automatically help discover people who are under depression. while it would take a considerable amount of time to manually sift through individual social media posts and profiles to locate people going through depression, automatic scalable computational methods could provide timely and mass detection of depressed people which could help prevent many major fatalities in the future and help people who genuinely need it at the right moment. the daily activities of users on social media could be a gold-mine for data miners because this data helps provide rich insights on user-generated content. it not only helps give them a new platform to study user behaviour but also helps with interesting data analysis, which might not be possible otherwise. mining users' behavioural patterns for psychologists and scientists through examining their online posting activities on multiple social networks such as facebook, weibo [ , ] , twitter, and others could help target the right people at right time and provide urgent crucial care [ ] . there are existing startup companies such as neotas with offices in london and elsewhere which mines publicly available user data on social media to help other companies automatically do the background check including understanding the mental states of prospective employees. this suggests that studying the mental health conditions of users online using automated means not only helps government or health organisations but it also has a huge commercial scope. the behavioural and social characteristics underlying the social media information attract many researchers' interests from different domains such as social scientists, marketing researchers, data mining experts and others to analyze social media information as a source to examine human moods, emotions and behaviours. usually, depression diagnosis could be difficult to be achieved on a large-scale because most traditional ways of diagnosis are based on interviews, questionnaires, self-reports or testimony from friends and relatives. such methods are hardly scalable which could help cover a larger population. individuals and health organizations have thus shifted away from their traditional interactions, and now meeting online by building online communities for sharing information, seeking and giving the advice to help scale their approach to some extent so that they could cover more affected population in less time. besides sharing their mood and actions, recent studies indicate that many people on social media tend to share or give advice on health-related information [ , , , ] . these sources provide the potential pathway to discover the mental health knowledge for tasks such as diagnosis, medications and claims. detecting depression through online social media is very challenging requiring to overcome various hurdles ranging from acquiring data to learning the parameters of the model using sparse and complex data. concretely, one of the challenges is the availability of the relevant and right amount of data for mental illness detection. the reason why more data is ideal is primarily that it helps give the computational model more statistical and contextual information during training leading to faithful parameter estimation. while there are approaches which have tried to learn a model on a small-scale data, the performance of these methods is still sub-optimal. for instance, in [ ] , the authors tried crawling tweets that contain depression-related keywords as ground truth from twitter. however, they could collect only a limited amount of relevant data which is mainly because it is difficult to obtain relevant data on a large-scale quickly given the underlying search intricacies associated with the twitter application programming interface (api) and the daily data download limit. despite using the right keywords the service might return several false-positives. as a result, their model suffered from the unsatisfactory quantitative performance due to poor parameter estimation on small unreliable data. the authors in [ ] also faced a similar issue where they used a small number of data samples to train their classifier. as a result, their study suffered from the problem of unreliable model training using insufficient data leading to poor quantitative performance. in [ ] the authors propose a model to detect anxious depression of users. they have proposed an ensemble classification model that combines results from three popular models including studying the performance of each model in the ensemble individually. to obtain the relevant data, the authors introduced a method to collect their data set quickly by choosing the first randomly sampled users who are followers of ms india student forum for one month. a very common problem faced by the researchers in detecting depression on social media is the diversity in the user's behaviours on social media, making extremely difficult to define depressionrelated features to cope with mental health issues. for example, it was evidenced that although social media could help us to gather enough data through which useful feature engineering could be effectively done and several user interactions could be captured and thus studied, it was noticed in [ , ] that one could only obtain a few crucial features to detect people with eating disorders. in [ ] the authors also suffered from the issue of inadequate features including the amount of relevant data set leading to poor results. different from the above works, we have proposed a novel model that is trained on a relatively large dataset showcasing that the method scales and it produces better and reliable quantitative performance than existing popular and strong comparative methods. we have also proposed a novel hybrid deep learning approach which can capture crucial features automatically based on data characteristic making the approach reliable. our results show that our model outperforms several state-of-the-art comparative methods. depressed users behave differently when they interact on social media, producing rich behavioural data, which is often used to extract various features. however, not all of them are related to depression characteristics. many existing studies have either neglected important features or selected less relevant features, which mostly are noise. on the other hand, some studies have considered a variety of user behaviour. for example, [ ] is one such work that has collected a large-scale dataset with reliable ground truth labels. they then extracted various features representing user behaviour in social media and grouped these features into several modalities. finally, they proposed a new model called the multimodal dictionary learning model (mdl) to detect depressed users from tweets, based on dictionary learning. however, given the high-dimensional, sparse, figurative and ambiguous nature of tweet language use, dictionary learning cannot capture the semantic meaning of tweets. instead, word embedding is a new technique that can solve the above difficulties through neural network paradigms. hence, due to the capability of the word embedding for holding the semantic relationship between tweets and the knowledge to capture the similarity between terms, we combine multi-modal features with word embedding, to build a comprehensive spectrum of behavioural, lexical, and semantic representations of users. recently, using deep learning to gain insightful and actionable knowledge from complex and heterogeneous data has become mainstream in ai applications for healthcare, e.g. the medical image processing and diagnosis has gained great success. the advantage of deep learning sits in its outstanding capability of iterative learning and automated optimizing latent representations from multi-layer network structure [ ] . this motivates us to leverage the superior neural network learning capability with the rich and heterogeneous behavioural patterns of social media users. to be specific, this work aims to develop a new novel deep learning-based solution for improving depression detection by utilizing multi-modal features from diverse behaviour of the depressed user in social media. apart from the latent features derived from lexical attributes, we notice that the dynamics of tweets, i.e. tweet timeline provides a crucial hint reflecting depressed user emotion change over time. to this end, we propose a hybrid model comprising bidirectional gated recurrent unit (bigru) and conventional neural network (cnn) model to boost the classification of depressed users using multi-modal features and word embedding features. the model can derive new deterministic feature representations from training data and produce superior results for detecting depression-level of twitter users. our proposed model uses a bigru, which is a network that can capture distinct and latent features, as well as long-term dependencies and correlations across the features matrix. bigru is designed to use backward and forward contextual information in text, which helps obtain a user latent feature from their various behaviours by using a reset and update gates in a hidden layer in a more robust way. in general, gru-based models have shown better effectiveness and efficiency than the other recurrent neural networks (rnn) such as long short term memory (lstm) model [ ] . by capturing the contextual patterns bidirectionally helps obtain a representation of a word based on its context which means under different contexts, a word could have different representation. this indeed is very powerful than other techniques such as traditional unidirectional gru where one word is represented by only one representation. motivated by this we add a bidirectional network for gru that can effectively learn from multi-modal features and provide a better understanding of context, which helps reduce ambiguity. besides, bigru can extract more discrete features and helps improve the performance of our model. the bigru model could capture contextual patterns very well, but lacks in automatically learning the right features suitable for the model which would play a crucial role in predictive performance. to this end, we introduce a one-dimensional cnn as a new feature extractor method to classify user timeline posts. our full model can be regarded as a hybrid deep learning model where there is an interplay between a bigru and a cnn model during model training. while there are some existing models which have combined cnn and birnn models, for instance, in [ ] the authors combine bilstm or bigru and cnn to learn better features for text classification using an attention mechanism for feature fusion, which is a different modelling paradigm than what is introduced in this work, which captures the multi-modalities inherent in data. in [ ] , the authors proposed a hybrid bigru and cnn model which later constrains the semantic space of sentences with a gaussian. while the modelling paradigms may be closely related with the combinations of a bigru and a cnn model, their model is designed to handle sentence sentiment classification rather than depression detection which is a much more challenging task as tweets in our problem domain are short sentences, largely noisy and ambiguous. in [ ] , the authors propose a combination of bigru and cnn model for salary detection but do not exploit multi-modal and temporal features. finally, we also studied the performance of our model when we used the two attributes word embedding and multi-modalities separately. we found that model performance deteriorated when we used only multi-modal features. we further show when we combined the two attributes, our model led to better performance. to summarize, our study makes the following contributions: ( ) we propose a novel depression detection framework by deep learning the textual, behavioural, temporal, and semantic modalities from social media. ( ) a gated recurrent unit to detect depression using several features extracted from user behaviours. ( ) we built a cnn network to classify user timeline posts concatenated with bigru network to identify social media users who suffer from depression. to the best of our knowledge, this is the first work of using multi-modalities of topical, temporal and semantic features jointly with word embeddings in deep learning for depression detection. ( ) the experiment results obtained on a real-world tweet dataset have shown the superiority of our proposed method when compared to baseline methods. the rest of our paper is organized as follows. section reviews the related work to our paper. section presents the dataset that used in this work, and different pre-processing we applied on data. section describes the two different attributes that we extracted for our model. in section , we present our model for detection depression. section reports experiments and results. finally, section concludes this paper. in this section, we will discuss closely related literature and mention how they are different from our proposed method. in general, just like our work, most existing studies focus on user behaviour to detect whether a user suffers from depression or any mental illness. we will also discuss other relevant literature covering word embeddings and hybrid deep learning methods which have been proposed for detecting mental health from online social networks and other resources including public discussion forums. since we also introduce the notion of latent topics in our work, we have also covered relevant related literature covering topic modelling for depression detection, which has been widely studied in the literature. data present in social media is usually in the form of information that user shares for public consumption which also includes related metadata such as user location, language, age, among others [ ] . in the existing literature, there are generally two steps to analyzing social data. the first step is collecting the data generated by users on networking sites, and the second step is to analyze the collected data using, for instance, a computational model or manually. in any data analysis, feature extraction is an important task because using only a relevant small set of features, one can learn a high-quality model. understanding depression on online social networks could be carried out using two complementary approaches which are widely discussed in the literature, and they are: • post-level behavioural analysis • user-level behavioural analysis methods that use this kind of analysis mainly target at the textual features of the user post that is extracted in the form of statistical knowledge such as those based on count-based methods [ ] . these features describe the linguistic content of the post which are discussed in [ , ] . for instance, in [ ] the authors propose classifier to understand the risk of depression. concretely, the goal of the paper is to estimate that there is a risk of user depression from their social media posts. to this end, the authors collect data from social media for a year preceding the onset of depression from user-profiles and distil behavioural attributes to be measured relating to social engagement, emotion, language and linguistic styles, ego network, and mentions of antidepressant medications. the authors collect their data using crowd-sourcing task, which is not a scalable strategy, on amazon mechanical turk. in their study, the crowd workers were asked to undertake a standardized clinical depression survey, followed by various questions on their depression history and demographics. while the authors have conducted thorough quantitative and qualitative studies, they are disadvantageous in that it does not scale to a large set of users and does not consider the notion of text-level semantics such as latent topics and semantic analysis using word embeddings. our work is both scalable and considers various features which are jointly trained using a novel hybrid deep learning model using a multi-modal learning approach. it harnesses high-performance graphics processing units (gpus) and as a result, has the potential to scale to large sets of instances. in hu et al., [ ] the authors also consider various linguistic and behavioural features on data obtained from social media. their underlying model relies on both classification and regression techniques for predicting depression while our method performs classification, but on a large-scale using a varied set of crucial features relevant to this task. to analyze whether the post contains positive or negative words and/or emotions, or the degree of adverbs [ ] used cues from the text, for example, i feel a little depressed and i feel so depressed, where they capture the usage of the word "depressed" in the sentences that express two different feelings. the authors also analyzed the posts' interaction (i.e., on twitter (retweet, liked, commented)). some researchers studied post-level behaviours to predict mental problems by analysing tweets on twitter to find out the depression-related language. in [ ] , the authors have developed a model to uncover meaningful and useful latent structure in a tweet. similarly, in [ ] , the authors monitored different symptoms of depression that are mentioned in a user's tweet. in [ ] , they study users' behaviour on both twitter and weibo. to analyze users' posts, they have used linguistic features. they used a chinese language psychological analysis system called textmind in sentiment analysis. one of the interesting post-level behavioural studies was done by [ ] on twitter by finding depression relevant words, antidepressant, and depression symptoms. in [ ] the authors used postlevel behaviour for detecting anorexia; they analyze domain-related vocabulary such as anorexia, eating disorder, food, meals and exercises. there are various features to model users in social media as it reflects overall behaviour over several posts. different from post-level features extracted from a single post, user-level features extract from several tweets during different times [ ] . it also extracts the user's social engagement presented on twitter from many tweets, retweets and/or user interactions with others. generally, posts' linguistic style could be considered to extract features [ , , ] . the authors in [ ] extracted six depression-oriented feature groups for a comprehensive description of each user from the collected data set. the authors used the number of tweets and social interaction as social network features. for user profile features, they have used user shared personal information in a social network. analysing user behaviour looks useful for detecting eating disorder. in wang et al., [ ] they extracted user engagement and activities features on social media. they have extracted linguistic features of the users for psychometric properties which resembles the settings described in [ , , ] where the authors have extracted features from two different social networks (twitter and weibo). they extracted features from a user profile, posting time and user interaction feature such as several followers and followee. this is one interesting work [ ] where the authors combine user-level and post-level semantics and cast their problem as a multiple instance learning setup. the advantage that this method has is that it can learn from user-level labels to identify post-level labels. there is an extensive literature which has used deep learning for detecting depression on the internet in general ranging from tweets to traditional document collection and user studies. while some of these works could also fall in one of the categories above, we are separately presenting these latest findings which use modern deep learning methods. the most closely related recent work to ours is [ ] where the authors propose a cnn-based deep learning model to classify twitter users based on depression using multi-modal features. the framework proposed by the authors has two parts. in the first part, the authors train their model in an offline mode where they exploit features from bidirectional encoder representations from transformers (bert) [ ] and visual features from images using a cnn model. the two features are then combined, just as in our model, for joint feature learning. there is then an online depression detection phase that considers user tweets and images jointly where there is a feature fusion at a later stage. in another recently proposed work [ ] , the authors use visual and textual features to detect depressed users on instagram posts than twitter. their model also uses multi-modalities in data, but keep themselves confined to instagram only. while the model in [ ] showed promising results, it still has certain disadvantage. for instance, bert vectors for masked tokens are computationally demanding to obtain even during the fine-tuning stage, unlike our model which does not have to train the word embeddings from scratch. another limitation of their work is that they obtain sentence representations from bert, for instance, bert imposes a token length limit where longer sequences are simply truncated resulting in some information loss, where our model has a much longer sequence length which we can tune easily because our model is computationally cheaper to train. we have proposed a hybrid model that considers a variety of features unlike these works. while we have not specifically used visual features in our work, using a diverse set of crucial relevant textual features is indeed reasonable than just visual features. of course, our model has the flexibility to incorporate a variety of other features including visual features. multi-modal features from the text, audio, images have also been used in [ ] , where a new graph attention-based model embedded with multi-modal knowledge for depression detection. while they have used temporal cnn model, their overall architecture has experimented on small-scale questionnaire data. for instance, their dataset contains sessions of interactions ranging between - min (with an average of min). while they have not experimented their method with short and noisy data from social media, it remains to be seen how their method scales to such large collections. xezonaki et al., [ ] propose an attention-based model for detecting depression from transcribed clinical interviews than from online social networks. their main conclusion was that individuals diagnosed with depression use affective language to a greater extent than those who are not going through depression. in another recent work [ ] , the authors discuss depression among users during the covid- pandemic using lstm and fasttext [ ] embeddings. in [ ] , the authors also propose a multi-model rnn-based model for depression prediction but apply their model on online user forum datasets. trotzek et al., [ ] study the problem of early detection of depression from social media using deep learning where the leverage different word embeddings in an ensemble-based learning setup. the authors even train a new word embedding on their dataset to obtain task-specific embeddings. while the authors have used the cnn model to learn high-quality features, their method does not consider temporal dynamics coupled with latent topics, which we show to play a crucial role in overall quantitative performance. the general motivation of word embeddings is to find a low-dimensional representation of a word in the vocabulary that signifies its meaning in the latent semantic space. while word embeddings have been popularly applied in various domains in natural language processing [ ] and information retrieval [ ] , it has also been applied in the domain of mental health issues such as depression. for instance, in [ ] , the authors study on reddit (reddit is also used in [ ] ) a few communities which contain discussions on mental health struggles such as depression and suicidal thoughts. to better model the individuals who may have these thoughts, the authors proposed to exploit the representations obtained from word embeddings where they group related concepts close to each other in the embeddings space. the authors then compute the distance between a list of manually generated concepts to discover how related concepts align in the semantic space and how users perceive those concepts. however, they do not exploit various multi-modal features including topical features in their space. farruque et al., [ ] study the problem of creating word embeddings in cases where the data is scarce, for instance, depressive language detection from user tweets. the underlying motivation of their work is to simulate a retrofitting-based word embedding approach [ ] where they begin with a pre-trained model and fine-tune the model on domain-specific data. gong et al., [ ] proposed a topic modelling approach to depression detection using multi-modal analysis. they propose a novel topic model which is context-aware with temporal features. while the model produced satisfactory results on audio/visual emotion challenge (avec), the method does not use a variety of rich features and could face scalability issues because simple posterior inference algorithms such as those based on gibbs or collapsed gibbs sampling do not parallelize unlike deep learning methods, or one need sophisticated engineering to parallelize such models. twitter has been popularly regarded as one online social media resource that provides free data for data mining on tweets. this is the reason for its popularity among researchers who have widely used data from twitter. one can freely and easily download tweet data through their apis. however, in the past, researchers have generally followed two methods for using twitter data, which are: • using an already existing dataset shared freely and publicly by others. the downside of such datasets is that they might be old to learn anything useful in the current context. recency may be crucial in some studies such as understanding current trends of a recently trending topic [ ] . • crawling data using vocabulary from a social media network though is slow but helps get fresh, relevant and reliable data which would help learn patterns that are currently being discussed on online social networks. this method takes time to collect relevant and then process the data given that resources such as twitter which provide data freely impose tweet download restrictions per user per day, as a result of fair usage policy applied to all users. developing and validating the terms used in the vocabulary by users with mental illness is time-consuming but helps obtain a reliable list of words, by which reliable tweets could be crawled reducing the amount the false-positives. recent research conducted by the authors of [ ] is one such work that has collected a large-scale data with reliable ground truth data, which we aim to reuse. we present the statistics of the data in table . to exemplify the dataset further, the authors collected three complementary data sets, which are: • depression data set: each user is labelled as depressed, based on their tweet content between and . this includes , depressed users and , tweets. • non-depression data set: each user is labelled as non-depressed and the tweets were collected in december . this includes over million active users and billion tweets. • depression-candidate data set: the authors collected are labelled as depression-candidate, where the tweet was collected if contained the word "depress". this includes , depressioncandidate users and over million tweets. data collection mechanisms are often loosely controlled, impossible data combinations, for instance, users labelled as depressed but have provided no posts, missing values, among others. after data has dataset depressed non-depressed no. of users million no. of tweets , billion table . statistics of the large dataset collected by the authors in [ ] which is used in this study. been crawled, it is still not ready to be used directly by the machine learning model due to various noise still present in data, which is called the "raw data". the problem is even more exacerbated when data has been downloaded from online social media such as twitter because tweets may contain spelling and grammar mistakes, smileys, and other undesirable characters. therefore, a pre-processing strategy is needed to ensure satisfactory data quality for computational modal to achieve reliable predictive analysis. the raw data used in this study has labels of "depressed" and "non-depressed". this data is organised as follows: users: this data is packaged as a json file for each user account describing details about the user such as user id, number of followers, number of tweets etc. note that json is a standard popular data-interchange which is easy for humans to read and write. timeline: this data package contains files containing several tweets along with corresponding metadata, again in json format. to further clean the data we used natural language processing toolkit (nltk). this package has been widely used for text pre-processing [ ] and various other works. it has also been widely used for removing common words such as stop words from text [ , , ] . we have removed the common words from users tweets (such as "the", "an", etc.) as these are not discriminative or useful enough for our model. these common words sometimes also increase the dimensionality of the problem which could sometimes lead to the "curse-of-dimensionality" problem and may have an impact on the overall model efficiency. to further improve the text quality, we have also removed non-ascii characters which have also been widely used in literature [ ] . pre-processing and removal of noisy content from the data helped get rid of plenty of noisy content from the dataset. we then obtained a high-quality reliable data which we could use in this study. besides, this distillation helped reduce the computational complexity of the model because we are only dealing with informative data which eventually would be used in modelling. we present the statistics of this distilled data below: to further mitigate the issue of sparsity in data, we excluded those users who have posted less than ten posts and users who have less than followers, therefore we ended up with positive users and negative users. social media data conveys all user contents, insights and emotion reflected from individual's behaviours in the social network. this data shows how users interact with their connections. in this work, we collect information from each user and categorize it into two types of attributes, namely multi-modal attribute and word embedding, as follows: we introduce this attribute type where the goal is to calculate the attribute value corresponding to each modality for each user. we estimate that the dimensionality for all modalities of interest is ; and we mainly consider four major modalities as listed below and ignore two modalities due to missing values. these features are extracted respectively for each user as follows: . . social information and interaction. from this attribute, we extracted several features embedded in each user profile. these are features related to each user account as specified by each feature name. most of the features are directly available in the user data, such as the number of users following and friends, favourites, etc. moreover, the extracted features relate to user behaviour on their profile. for each user, we calculate their total number of tweets, their total length of all tweets and the number retweets. we further calculate posting time distribution for each user, by counting how many tweets the user published during each of the hours a day. hence it is a -dimensional integer array. to get posting time distribution for each tweet, we extract two digits as hour information, then go through all tweets of each user and track the count of tweets posted in each hour of the day. emojis allow users to express their emotions through simple icons and non-verbal elements. it is useful to get the attention of the reader. emojis could give us a glance for the sentiment of any text or tweets, and it is essential to differentiate between positive and negative sentiment text [ ] . user tweets contain a large number of emojis which can be classified into positive, negative and neutral. for each positive, neutral, and negative type, we count their frequency in each tweet. then we sum up the numbers from each user's tweets to get the sum for each user. so the final output is three values corresponding to positive, neutral and negative emojis by the user. we also consider voice activity detection (vad) features. these features contain valance, arousal and dominance scores. for that, we count first person singular and first person plural. using affective norms for english words, a vad score for words are obtained. we create a dictionary with each word as a key and a tuple of its (valance, arousal, dominance) score as value. next, we parse each tweet and calculate vad score for each tweet using this dictionary. finally, for each user, we add up the vad scores of tweets by that user, to calculate the vad score for each user. topic modelling belongs to the class statistical modelling frameworks which helps in the discovery of abstract topics in a collection of text documents. it gives us a way of organizing, understanding and summarizing collections of textual information. it helps find hidden topical patterns throughout the process, where the number of topics is specific by the user apriori. it can be defined as a method of finding a group of words (i.e. topics) from a collection of documents that best represent the latent topical information in the collection. in our work, we applied the unsupervised latent dirichlet allocation (lda) [ ] to extract the most latent topic distribution from user tweets. to calculate topic level features, we first consider corpus of all tweets of all depressed users. next, we split each tweet into a list of words and assemble all words in decreasing order of their frequency of occurrence, and common english words (stopwords) are removed from the list. finally, we apply lda to extract the latent k = topics distribution, where k is the number of topics. we have found experimentally k = to be a suitable value. while there are tuning strategies and strategies based on bayesian non-parametrics [ ] , we have opted to use a simple, popular, and computationally efficient approach which helps give us the desired results. it is the count of depression symptoms occurring in tweets, as specified in nine groups in dsm-iv criteria for a depression diagnosis. the symptoms are listed in appendix a. we count how many times the nine depression symptoms are mentioned by the user in their tweets. the symptoms are specified as a list of nine categories, each containing various synonyms for the particular symptom. we created a set of seed keywords for all these nine categories, and with the help of the pre-trained word embedding, we extracted the similarities of these symptoms to extend the list of keywords for each depression symptoms. furthermore, we scan through all tweets, counting how many times a particular symptom is mentioned in each tweet. we also focused on the antidepressants, and we created a lexicon of antidepressants from the "antidepressant" wikipedia page which contains an exhaustive list of items and is updated regularly, in which we counted the number of names listed for antidepressants. the medicine names are listed in appendix b. word embeddings are a class of representation learning models which find the underlying meaning of words in the vocabulary in some low-dimensional semantic space. their underlying principle is based on optimising an objective function which helps bring words which are repeatedly occurring together under a certain contextual window, close to each other in the semantic space. the usual windows size that works well in many settings is [ ] . a remarkable ability of these models is that they can effectively capture various lexical properties in natural language such as the similarity between words, analogies among words, and others. these models have become increasingly popular in the natural language processing domain and have been used as input to deep learning models. among various word embedding models proposed in the literature, word vec [ ] is one of the most popular techniques that use shallow neural networks to learn word embedding. word vec is a predictive model for learning word embeddings from raw text that is also computationally efficient. word vec takes a large corpus of text as its input and generates a vector space with a corresponding vector in the space allocated to each specific word. word vectors are placed in the space of the vector. the words that share common meanings in the corpus are located in space near to each other. to learn the semantic meaning between the words that were posted by depressed users, we add a new attribute to extract more meaningful features. count features in multi-modalities attribute are useful and effective to extract features from normal text. however, they could not effectively capture the underlying semantics, structure, sequence and meaning in tweets. while count features are based on the independent occurrence of words in a text corpus, they cannot capture the contextual meaning of words in the text which is effectively captured by word embeddings. motivated by this, we apply word embedding techniques to extract more meaningful features from every user's tweets and capture the semantic relationship among word sequence. we used a popular model called word vec [ ] with a -dimensional set of word embeddings pre-trained on google news corpus to produce a matrix of word vectors. the skip-gram model is used to learn word vector representations which are characterised by low-dimensional real-valued representations for each word. this is usually done as a pre-processing stage, after which the vectors learned are fed into a model. in this section, we describe our hybrid model that learns from multi-modal features. while there are various hybrid deep learning models proposed in the literature, our method is novel in that it learns multi-modal features which include topical features as shown in figure . the joint learning mechanism learns the model parameters in a consolidated parameter space where different model parameters are shared during the training phase leading to more reliable results. note that simple cascaded-based approaches incorporate error propagation from one stage to next [ ] . at the end of the feature extraction step, we obtain the training data in the form of an embedding matrix for each user representing the user timeline posts attribute. we also have a -dimensional vector of integers for each user representing the multi-modalities attribute. due to the complexity of user posts and the diversity of their behaviour on social media, we propose a hybrid model based on cnn that combines with bigru to detect depression through social media as depicted in figure . for each user, the model takes two inputs for the two attributes. first, the four modalities feature input that represents user behaviour vector runs into bigru, capturing distinct and latent features, as well as long-term dependencies and correlation across the features matrix. the second input represents each user input tweet that will be replaced with it's embedding and fed to the convolution layer to learn some representation features from the sequential data. the output in the middle of both attributes is concatenated to represent one single vector feature that fed into an activation layer of sigmoid for prediction. in the following sections, we will discuss the following two existing separate architectures which will be combined leading to a novel computational model for modelling spatial structures and multi-modalities. in particular, the model comprises a cnn network to learn the spatial structure from user tweets and a framework to extract latent features from multi-modalities attribute followed by the application of bigru. an individual user's timeline comprises semantic information and local features. recent studies show that cnn has been successfully used for learning strong, suitable and effective features representations [ ] . the effective feature learning capabilities of cnns make them an ideal choice to extract semantic features from a user post. in this work, we propose to apply cnn network to extract semantic information features from user tweets. the input to our cnn network is the embedded matrix layer with a sentence matrix and the sentence will be treated as sequence of words s : [w , w , w , . . . , w i ]. each word w ∈ r ×d is a one vector of the embedding matrix r w×d , where d represents the dimension of each word in the matrix and w represents the length or number of words for each user posts. we set the size of each user sentence between and words and describe the average of only ten tweets for each user. note that this size is much larger than what has been used in other recent closely-related models which are based on bert. also, we could train our model on the dataset which helps create specific representations for our dataset in a computationally less demanding way unlike those which are based on bert that is both computational and financially expensive to train followed by fine-tuning. the input layer is attached to the convolution layer by three conventional layers to learn n-gram features capturing word order; thereby capturing crucial text semantic which usually cannot be captured by a bag-of-words-based model [ ] . we use a convolution operation c n to extract features between words as follows: ( ) where f is a nonlinear function, b denotes bias and x n:n+h− a window of h words. here the convolution is applied to the window of word vector, where the window size is h. the network now creates a feature map according to the following equation: ( ) the output of convolution layer feature map will be an input for the pooling layer, which is an important step to reduce dimension of the space by selecting appropriate features. we used the max pooling layer to calculate the maximum value for every feature-map patch. the output of pooling operation is generated as follows: . we add the lstm layer to create a stack of deep learning algorithms to optimize the results. the recurrent neural network (rnn) is a powerful network when the input is fixed vectors to process in sequence even if the data is non-sequential. models such as bigru, gru, and lstm fall in the class of rnns. the static attributes are usually inputted to the bigru. gru is an alternative of lstm and links the forget gate and the input gate into a single update gate, which is computationally efficient than an lstm network due to the reduction of gates. gru can effectively and efficiently capture long-distance information between features, but one way or unidirectional gru could only capture the historical information features partly. moreover, for our static attributes, we would like to get the information about the behavioural semantics of each user. to this end, we have applied bigru to combine the forward and backward directions for every input feature to capture the behavioural semantics in both directions. bidirectional models, in general, capture information of the past and the future, where information is captured considering both past and future contexts which makes it more powerful than unidirectional models [ ] . suppose the input which resembles a user behaviour be represented as x ,x ..., xn. when we apply the traditional unidirectional gru, we have the following form: ( ) bidirectional gru actually consist of two layers of gru as in figure , and introduced to obtain the forward and the backward information. and the hidden layer has two values for the output, one for backward output and the other to forward output, and the algorithm can be describe as follow: where h s represents the input of step s, while ì h s and h s represent the hidden state of the forward and the backward gru in step s. each gru network is defined as the follow: , gru network is calculates the update gate z s in the time step s. this gate helps the model decide how much information is obtained from the previous step which could be passed to the next step. the reset gate in equation is used to determine how much information from past step needs to be forgotten. the gru model used a reset gate to save related information from the past as depicted in equation . lastly, the model will calculate h s that holds all the information and passes it down to the network as depicted in equation . after we obtain the latent features from each model, we integrate these features and concatenate them as feature vector to be input into an activation function for classification as mentioned below. experiments and results we compare our model with the following classification methods: • ∼mdl: multimodal dictionary learning model (mdl) is to detect depressed users on twitter [ ] . they use a dictionary learning to extract latent data features and sparse representation of a user. since we cannot get access to all [ ] 's attributes, we implement mdl in our way. • svm: support vector machines are a class of machine learning models in text classification that try to optimise a loss function that learns to draw a maximum-margin separating hyperplane between two sets of labelled data, e.g., drawing a maximum-margin hyperplane between a positive and negative labelled data [ ] . this is the most popular classification algorithm. • nb: naive bayes is a family of probabilistic algorithms based on applying bayes' theorem with the "naive" assumption of conditional independence between instances [ ] . while the suitability conditional independence has been questioned by various researchers, these models surprisingly give superior performance when compared with many sophisticated models [ ] . for our experiments, we have used the datasets as mentioned in section ( ). they provide a large scale of data, especially for labelled negative and candidate positive. after pre-processing and extracting info from their raw data, we filter out the below datasets to perform our experiments: • number of users labelled positive: . • number of tweets from positive users: . • number of users labelled negative: . • number of tweets from negative users: . then further excluded users who posted less than ten posts and users who have more than followers, we end up with a final dataset consisting of positive users and negative users. we adopt the ratio : to split our data into training and test. we used pre-trained word vec that is trained on google news corpus which comprises of billion words. we used python . . and tensorflow . . to develop our implementation. we rendered the embedding layer to be not trainable so that we keep the features representations, e.g., word vectors and topic vectors in their original form. we used one hidden layer, and max-pooling layer of size which gave better performance in our setting. for both network bigru and cnn optimization, we used adam optimization algorithm. finally we trained our model for iterations, with batch size of . the number of iterations was sufficient to converge the model and our experimental results further cement this claim where we outperform existing strong baseline methods. we employ traditional information retrieval metrics such as precision, recall, f , and accuracy based on the confusion matrix to evaluate our model. a confusion matrix is a sensational matrix used for evaluating classification performance, which is also called an error matrix because it shows the number of wrong predictions versus the number of right predictions in a tabulated manner. some important terminologies associated with computing the confusion matrix are the following: • p: the actual positive case, which is depressed in our task. • n: the actual negative case, which is not depressed in our task. • tn: the actual case is not depressed, and the predictions are not depressed as well • fn: the actual case is not depressed, but the predictions are depressed. • fp: the actual case is depressed, but the predictions are not depressed. • tp: the actual case is depressed, and the predictions are depressed as well. based on the confusion matrix, we can compute the accuracy, precision, recall and f score as follows: in our experiments, we study our model attributes including the quantitative performance of our hybrid model. the multi-modalities attribute and user's timeline semantic features attribute, we will use both these attributes jointly. after grouped user behaviour in social media into multi-modalities attribute (mm), we evaluate the performance of the model. first, we examine the effectiveness of using the multi-modalities attribute (mm) only with different classifiers. second, we showed how the model performance increased when we combined word embedding with mm. we summarise the results in table and figure as follows: • naive bayes obtain the lowest f score, which demonstrates that this model has less capability to classify tweets when compared with other existing models to detect depression. the reason for its poor performance could be that the model is not robust enough to sparse and noisy data. • ∼mdl model outperforms svm and nb and obtains better accuracy than these two methods. since this is a recent model especially designed to discover depressed users, it has captured the intricacies of the dataset well and learned its parameters faithfully leading to better results. • we can see our proposed model improved the depression detection up to % on f -score, compared to ∼mdl model. this suggests that our model outperforms a strong model. the reason why our model performs well is primarily because it leverages a rich set of features which is jointly learned in the consolidated parameters estimation resulting in a robust model. • we can also deduce from the table that our model consistently outperforms all existing and strong baselines. • furthermore, our model achieved the best performance with % in f , indicating that combining bigru with cnn for multimodal strategy for user timeline semantic features strategy is sufficient to detect depression in twitter. to get a better look for our model performance and how it does classify the samples, we have used the confusion matrix. for this, we import the confusion matrix module from sklearn, which helps us to generate the confusion matrix. we visualize the confusion matrix, which demonstrates how classes are correlated to indicate the percentage of the samples. we can observe from figure that our model predicts effectively non-depressed users (tn) and depressed users (tp). we have also compared the effectiveness of each of the two attributes of our model. therefore, we test the performance of the model with a different attribute, we build the model to feed it with each attribute separately and compare how the model performs. first, we test the model using only the multi-modalities attribute, we can observe in fig the model perform less optimally when we used bigru only. in contrast, the model performs better when we use only cnn with word embedding attribute. this signifies that extracting semantic information features from user tweets is crucial for depression detection. although, the model when used only word embedding attribute outperform multi-modalities, still the true positive rate (sensitivity) for both attribute are close to each other as we see the precision score for each bigru and cnn. finally, we can see the model performance increased when combined both cnn and bigru, and outperforms when using each attribute independently. after depressed users are classified, we examined the most common depression symptoms among depressed users. in figure , we can see symptom one (feeling depressed), is the most common symptom posted by depressed users. that shows how depressed users are exposing and posting their depressive mood on social media more than any other symptoms. besides that, other symptoms such as energy loss, insomnia, a sense of worthlessness, and suicidal thoughts have appeared in more than % of the depressed user. to further investigate the five most influencing symptoms among depressed users, we collected all the tweets associated with these symptoms. then we created a tag cloud [ ] for each of these five symptoms, to determine what are the frequent words and importance that related to each symptom as shown in figure where larger font words are relatively more important than rest in the same cloud representation. this cloud gives us an overview of all the words that occur most frequently within each of these five symptoms. in this paper, we propose a new model for detecting depressed user through social media analysis by extracting features from the user behaviour and the user's online timeline (posts). we have used a real-world data set for depressed and non-depressed users and applied them in our model. we have proposed a hybrid model which is characterised by introducing an interplay between the bigru and cnn models. we assign the multi-modalities attribute which represents the user behaviour into the bigru and user timeline posts into cnn to extract the semantic features. our model shows that by training this hybrid network improves classification performance and identifies depressed users outperforming other strong methods. this work has great potential to be further explored in the future, for instance, we can enhance multi-modalities feature by using short-text topic modelling, for instance, propose a new variant of the biterm topic model (btm) [ ] capable of generating depression-associated topics, as a feature extractor to detect depression. besides, using a new recently proposed popular word representation techniques also known as pre-trained language models such as deep contextualized word representations (elmo) [ ] and bidirectional encoder representations from transformers (bert) [ ] , and train them on a large corpus of depression-related tweets instead of using a pre-trained word embedding model. while there will be challenges when using such pre-trained language models can introduce because of the restriction that they impose on the sequence length; nevertheless, studying these models on this task helps to unearth their pros and cons. eventually, our future works aim to detect other mental illness in conjunction with depression to capture complex mental issues which have pervaded into an individual's life. diagnostic and statistical manual of mental disorders (dsm- ®) towards using word embedding vector space for better cohort analysis depressed individuals express more distorted thinking on social media latent dirichlet allocation methods in predictive techniques for mental health status on social media: a critical review libsvm: a library for support vector machines multimodal depression detection on instagram considering time interval of posts empirical evaluation of gated recurrent neural networks on sequence modeling predicting depression via social media depression detection using emotion artificial intelligence bert: pre-training of deep bidirectional transformers for language understanding a depression recognition method for college students using deep integrated support vector algorithm augmenting semantic representation of depressive language: from forums to microblogs retrofitting word vectors to semantic lexicons analysis of user-generated content from online social communities to characterise and predict depression degree topic modeling based multi-modal depression detection take two aspirin and tweet me in the morning: how twitter, facebook, and other social media are reshaping health care natural language processing methods used for automatic prediction mechanism of related phenomenon predicting depression of social media user on different observation windows anxious depression prediction in real-time social data rehabilitation of count-based models for word vector representations text-based detection and understanding of changes in mental health sensemood: depression detection on social media supervised deep feature extraction for hyperspectral image classification using social media content to identify mental health problems: the case of# depression in sina weibo mental illness, mass shootings, and the politics of american firearms advances in pretraining distributed word representations rethinking communication in the e-health era on discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes borut sluban, and igor mozetič deep learning for depression detection of twitter users depressive moods of users portrayed in twitter glove: global vectors for word representation deep contextualized word representations identifying health-related topics on twitter early risk detection of anorexia on social media beyond lda: exploring supervised topic modeling for depression-related language in twitter beyond modelling: understanding mental disorders in online social media dissemination of health information through social networks: twitter and antibiotics depression detection via harvesting social media: a multimodal dictionary learning solution cross-domain depression detection via harvesting social media multi-modal social and psycho-linguistic embedding via recurrent neural networks to identify depressed users in online forums detecting cognitive distortions through machine learning text analytics a comparison of supervised classification methods for the prediction of substrate type using multibeam acoustic and legacy grain-size data sharing clusters among related groups: hierarchical dirichlet processes understanding depression from psycholinguistic patterns in social media texts utilizing neural networks and linguistic metadata for early detection of depression indications in text sequences recognizing depression from twitter activity timelines tag clouds and the case for vernacular visualization detecting and characterizing eatingdisorder communities on social media topical n-grams: phrase and topic discovery, with an application to information retrieval salary prediction using bidirectional-gru-cnn model world health organization estimating the effect of covid- on mental health: linguistic indicators of depression during a global pandemic modeling depression symptoms from social network data through multiple instance learning georgios paraskevopoulos, alexandros potamianos, and shrikanth narayanan. . affective conditioning on hierarchical networks applied to depression detection from transcribed clinical interviews a biterm topic model for short texts semi-supervised approach to monitoring clinical depressive symptoms in social media survey of depression detection using social networking sites via data mining relevance-based word embedding combining convolution neural network and bidirectional gated recurrent unit for sentence semantic classification feature fusion text classification model combining cnn and bigru with multi-attention mechanism graph attention model embedded with multi-modal knowledge for depression detection medlda: maximum margin supervised topic models. the depression and disclosure behavior via social media: a study of university students in china list of depression symptoms as per dsm-iv:( ) depressed mood.( ) iminished interest. key: cord- -cqkpi z authors: tajan, louis; westhoff, dirk title: approach for gdpr compliant detection of covid- infection chains date: - - journal: nan doi: nan sha: doc_id: cord_uid: cqkpi z while prospect of tracking mobile devices' users is widely discussed all over european countries to counteract covid- propagation, we propose a bloom filter based construction providing users' location privacy and preventing mass surveillance. we apply a solution based on bloom filters data structure that allows a third party, a government agency, to perform some privacy-preserving set relations on a mobile telco's access logfile. by computing set relations, the government agency, given the knowledge of two identified persons, has an instrument that provides a (possible) infection chain from the initial to the final infected user no matter at which location on a worldwide scale they are. the benefit of our approach is that intermediate possible infected users can be identified and subsequently contacted by the agency. with such approach, we state that solely identities of possible infected users will be revealed and location privacy of others will be preserved. to this extent, it meets general data protection regulation (gdpr)requirements in this area. cases of covid- disease have been reported in more than countries and its spreading has been characterized as pandemic by the world health organization on . . . one of its multiple side effects consists of european democracies being challenged. indeed, several countries are collecting location-based data from their own citizens. the state of emergency for health reasons has been established in countries as spain, portugal, france or switzerland. such a specific situation empowers a government to perform actions that would normally not be allowed to undertake. for instance, in milano, italy, mobile network operators are providing information on users' traffic to public authorities. in germany, issues regarding how and for which usage to process the location-based information are ones of the most discussed. indeed, efforts in germany are twofold regarding digital support to detect infection chains. first, with an app corona-warn-app deployed and downloaded more than millions times in germany (population of approx. millions). it consists of using a tracking app with bluetooth in which a smartphone of an infected user is subsequently informing all devices which have been in proximity (within the beaconing received range at some point in time in the past). such an approach is very vulnerable due to the requirement of continuously activated bluetooth. the recently published families of bias [ ] or blueborne [ ] attacks have shown that mobile devices with activated bluetooth can easily be remotely executed, e.g. cve- - , cve- - or cve- - and are classified as a severe risk. moreover, it has been pointed out that the harvesting of contacts via bluetooth with a tracking app is only properly working in case the app is activated continuously in the foreground, and, moreover, that at least % of the smartphone users need to download and continuously use it to indeed have an impact with respect to the identification of infection chains. second, telco operators would provide access logfiles of mobile network base stations to rki (robert-koch-institute) to support inferring infection chains. on the contrary, the netherlands' government decided to not approve a general confinement, for the reason of being incompatible with individual freedom. for these reasons, in the work at hand we attempt to propose a construction which combine the efficiency to help the public authorities to contain the virus spreading with the possibility to provide privacy with respect to the citizens. therefore, we concentrate on providing a privacy-preserving solution for the nd effort currently done within germany. our proposed solution makes use of our previous works [ ] , [ ] which allows a non-trusted third party to privately compute operations and relations on sets using bloom filters data structure. such data structure allows one to represent a large set of elements in a simple tabular of bits which could provides obfuscation and privacy on the set. we recall that gdpr's two main objectives are to firstly enhance the personal data protection by processing them and to secondly empower the companies in charge of this processing procedure. even if this regulation does not apply on fields as public health or national security [ ] , weaving the proposed bloom filter based private protocols into infection chains investigation would limit government agencies to solely identify users with high probability of being infected instead of a massive data analysis of all mobile users. several approaches from related work allow one to perform computations on pseudonymized, obfuscated or even encrypted data without the need to discern them. we could list homomorphic encryption [ ] , [ ] or multi-party computation [ ] , [ ] which represent the mainly investigated techniques. in [ ] , we applied our bloom filter based construction to several use cases of post-mortem mobile device tracking. in our former work [ ] , we have shown that this alternative approach based on bloom filter could be used to secure data while preserving the ability of performing relevant tests or computations on the private data. bloom filters have been used in many different scenarios as presented in [ ] . for instance, kerschbaum directly encrypts the bloom filter with homomorphic encryption [ ] . in [ ] authors applied the bloom filter to key exchange mechanisms in wireless sensor network (wsn) environment while in [ ] , authors optimize the sensor nodes broadcasting with the use of bloom filters. regarding the investigation of privacy-preserving location tracing solutions in the environment of covid- spreading, we could mention the work of pepp-pt consortium [ ] . this european team provide standards, technology, and services to countries and developers with the objective to help stopping the covid- spreading. a government agency, which role is to reduce the spreading of the sars-cov- virus in its country, knows different pairs of infected persons (a, b). its objective here, is to identify all the possible paths which relies user a to user b and considers the case where infection of user b is a consequence of user a's infection. by retrieving all possible paths (surely it could also turn out that no path exists and the infection of users a and b was unrelated), the agency could identify all the users within this path that may be also infected by the virus and try to contact them. indeed, different mobile device's users close to the same mobile base station at the same time could potentially spread the virus in case of one being infected. to do so, the agency is analyzing connection data provided by a telco company. the connection logs are collected on the base stations which are providing network access to the users' mobile devices. a. parties involved. four parties are involved in the scenario: users: could be infected by the sars-cov- virus. they are connecting to the base stations to access the mobile network. telco company: provides network to the users via several base stations. it also provides log data from the network connections to government agencies. base stations: are distributed over several countries, provide network to the users' mobile devices and collect connection data. government agency: aims to identify "infection chains" in order to contact the possible infected users and counteract the covid- disease pandemic. for each base station j, the telco company firstly generates and initializes a fresh bloom filter bfj represented by a tabular of bits, all set to . any time a user is connecting to the mobile network using base station j, the following connection information is aggregated and added to bfj: with idi the user's credentials and t i and t i respectively the starting and ending times of its connection to the access point. such connection data should be considered as sensitive regarding the location privacy of the users. as it will be presented, we consider a bloom filter-based approach which brings privacy to the stored data. indeed, on the one hand the base stations are using usernames to characterize the users and on the other hand only the telco company could generate and access the connection information from the base stations. c. proximity chain -infection chain as notation rule, we use to express proximity chains and [] for infection chains. a proximity chain consists of a list of users where two successive ones have been at the same location at the same time. to establish a proximity chain, these times of contact should be ordered. in other words, in the proximity chain a, d, f, e, b , the time at which users a and d have been at the same location should precede the one for users d and f (i.e. [t . in addition to be defined as a proximity chain, the list could also represent an infection chain. in this case, all the users composing the chain should have a probability of being infected p r(xi) greater than a certain threshold t r. more concretely, an infection chain [a, x , . . . , xn, b] is a proximity chain for which it holds that: ∀ xi : p r(xi) > t r, otherwise it is solely a proximity chain. therefore, an infection chain [a, x , . . . , xn, b] represents how the sars-cov- virus may have spread from an initially infected user a to a consecutive infected user b. it may happen that one or several subsets of a proximity chain a, x , . . . , xn, b are considered as infection chains, e.g. [a, x , . . . , xi] and/or [xj, . . . , b]. we consider the government agency as the principal threat for the application's users. as we stated previously, even if gdpr does not apply on public health security matters, we aim to apply limitations on government agencies. in such a way, we would like that the agencies could only identify users with high probability of being infected instead of having a massive data analysis of all mobile users. as we will present in the following sections, having the telco company colluding with the government would allow the agency to access personal data of all users and therefore we do not consider such assumption. even if we do not get any collision, we could also precise that users are not trusting the telco company. indeed, they seek to limit the mobile devices to collect personal data as much as possible. we also consider that users do not trust any approaches that require to maintain bluetooth continuously on since multiple types of attacks could occur as by example remote code executions from bleedingbit vulnerabilities [ ] . as recently proposed in [ ] , the bloom filter data construction could allow to privately represent sets of elements and at the same time enable performance-saving computation on them. exactly due to this performance-saving privacy extension, we argue that our approach also suits for such massive data sets like mobile access logfiles. at next, we give a background on bloom filters and the relevant set relation and recall the basic protocol's functions. a bloom filter is a data structure introduced by burton howard bloom in [ ] . it is used to represent a set of elements. with a bloom filter representing a certain set, one can verify whether an element is a member of this set. such a data structure consists of a tabular of m bits which is associated to k public hash functions. at first, all the m bits are initialized to . to add an element to the bloom filter, one has to compute the hashes of this element with each of respective k hash functions. then, set the bit to for each position corresponding to a hash value. to test whether one element is included in the bloom filter, one has, similarly, to compute the respective hash values of this element and verify if the respective bits are set to . if at least one of these bits is set to , then we know for sure that the tested element is not a member of the set represented by the bloom filter (i.e. no false negative could append when testing an element). on the contrary, with some (minor) probability, the testing function could retrieve a false positive. indeed, even if all the bits that have been verified are set to , the tested element may not be part of the set represented by the bloom filter. multiple types of operations could be performed on sets. for privacy concerns it could be of interest to solely reveal the cardinality of the resulting set instead of its content. therefore, we propose a solution on adapted bloom filters (see iii-c) to use one kind of set relations namely the inclusiveness defined as follows: definition (inclusiveness): let a and b be finite sets. we consider a included in b, i.e. a ⊂ b, iff all elements from a are included in b : ∀a ∈ a : a ∈ b. to guarantee full privacy of the sets' content along with their cardinality, we proposed in [ ] to modify the bloom filters approach in two aspects. firstly, instead of using k public hash functions, we are using a unique hmac function with k secret keys. secondly, the exact value of k is kept secret and is privately and randomly generated within two publicly known boundaries. we specify the functions regarding the initialization phase and the inclusiveness protocol. then we express the number of bits set to in the resulting bloom filter. if it is equal to m, we can conclude that a ⊆ b if no false positive occurred. otherwise we get a b with certainty. for an evaluation of the correctness and the security of this protocol, we refer the readers to [ ] . it is shown that a proper selection of parameters m and k considering the number of elements to be inserted, guarantees the limitation of overlapping bits in the resulting bloom filter and enables the rd party to correctly conclude on the inclusiveness property of the two sets. indeed, a too large amount of overlapping bits in the resulting bloom filter would lead to a case of false negative. from any two given infected users a and b, the government agency first aims to identify all the proximity chains a, idx , . . . , idx n , b . in our protocol, we recall that the telco company provides all the relevant bloom filters to the government agency. we propose to dissociate three cases: we remark here that we know users a and b but we do not know user x nor his access credential idx , so the government agency has to search in all base stations for all xj for which the above two inclusiveness tests inc hold. if p r(idx ) > t r we can denote [a, idx , b]. • case : the general case a, idx , . . . , idx n , b : we our solution consists of having the government agency building a data tree structure representing all the proximity chains starting from user a. from this tree, the agency could easily identify the proximity chains from user a to user b. for the next step of the protocol, the government agency has to evaluate the chain to determine its plausibility to actually be an infection chain. we give the outlines of this step but not its evaluation function that we save for the epidemiologists. we emphasize that at this point, the proximity or infection chains will only reveal usernames of users x , . . . , xn and not their real identities. at the very end of the protocol, the government agency will request from the telco company the identities of the intermediate infected users. to obtain a proximity tree, the government agency starts by creating an empty tree t with user a as root. then, it processes the recursive algorithm prox tree(a, a, b, t ) presented in algorithm with t the time from when user a could have started the infection process. the recursive algorithm does as follow: first, it generates the list bsn of base stations that the current node n has been connected to at a time later than t. to test if a user n has been connected to a base station j (i.e. test if (idn , t j , t j ) ∈ bfj), the government agency receives from the telco company all the bloom filters composed of each of the -tuples (idn , t j , t j ). then, the government agency performs the inclusiveness testing between the received bloom filters and bfi, the bloom filter corresponding to the connections logfile from bsj as : in c(bfn,j, bfi) . the next step of the algorithm consists of identifying all the users that visited the base stations from set bsn at the same moment than user n . as before, the telco company generates bloom filters with the -tuples (id l , t l , t l ) for all users l and all time ranges [t l ; t l ] that overlap the connection time of user n . to determine which users should be listed, the government agency performs the inclusiveness operator between these bloom filters and bfn the one composed by the elements from bsn . finally, for every identified users, they are added to the proximity tree t as a leaf of current node n and algorithm is then recursively processed on the leaves. an additional aspect to take into account while recursively processing the algorithm is to consider the upper nodes of the current node in the proximity tree. indeed, we would like to avoid creating some loops in the tree which are irrelevant when dealing with infection problems; if user a infected user c, it makes no sense to consider user c infecting user a in short period of time. the algorithm should then exclude all the users which are already inserted as upper nodes in the tree. regarding the tree construction, if we consider that user c has been in proximity of user a and idc is added as a leaf of root a, user a should not be considered anymore as potential leaf of node idc and so on. in figure we give a toy example of our recursive algorithm with seven users a, b, c, d, e, f, g, three base stations bsj , bsj , bsj and times as integers in [ ; ]. we show the content of connection logfiles from the three base stations and the proximity tree from user a to user b that has been generated by computing prox tree (a, a, b, ) . we observe in figure that two users might be in contact around different base stations. indeed, the algorithm prox tree (n, a, b, a) algorithm optimization: with respect to performance, one could consider computing the algorithm on the opposite way, namely with input b as root. to do so, the algorithm should be modified so that time is considered backwards. it starts at ending time ( for our toy example) and we build the proximity tree by going back in time. we consider as reverse prox tree this reverse recursive algorithm. in figure we show the proximity tree obtained after computing reverse prox tree(b, a, b, ) from user b considering the time backwards. as expected, the resulting proximity chains are the same than in figure but we remark that the resulting tree is smaller than the one obtained in figure . in this specific toy example we notice that obtaining the proximity tree was made faster by reversing our algorithm. example of a proximity tree obtained from reverse prox tree(b, a, b, ) with the same toy example than another aspect we could consider while comparing the two resulting trees, is that the order the tree is being build and the proximity chain obtained is also reversed. indeed, in figure we obtain first a, c, g, b then a, g, b (via j ), a, g, b (via j ) and finally a, b . in figure we see that we obtain the chains in the exact opposite order with reverse prox tree. still aiming to optimize the computation time of our algorithm, in particular when dealing with large numbers of users and base stations, one could simultaneously start the tree generation using the algorithm and its reversed version. for both cases the tree propagates and every time we find a proximity chain in the tree (meaning n = b or n = a for reverse prox tree) we could store the chain in a set s (or s for reverse prox tree). then for each round (i.e for iteration) we test if the two sets have a common element. if not, we continue. in case they have a common proximity chain, we could stop both algorithms and the complete set of proximity chains from users a to b is composed of the addition of sets s and s . to illustrate the approach of computing both versions at the same time and, as argued, gain on performance, one could explain: • if you throw one stone into the water and you want the resulting waves to reach a point in r meters distance, then the circle at the end will encompass many square meters. • if you throw two stones into the water (one at the original position, the other one at the position you want to reach), the intersection of the resulting waves propagation will be approx. at a distance r/ meters. • adding the area of these two circles shall be much smaller than the circle's area obtained with one stone. for example, with a = π × r and r = a = . , and with r = the area of the two circles is altogether approximately ! another level of optimization could be considered in order to identify some of the proximity chains faster as for instance to support the start of a localized quarantine immediately. instead of storing the chains into s and s , at each propagation round we look at the chains while they are processed so that we stop both algorithms when: • prox tree has built a path a, x , . . . , xi • reverse prox tree has built a path b, xn, . . . , xj • and it holds xi == xj then the two parts of the proximity chain could be concatenated to create the proximity chain a, x , . . . , xi == xj, . . . , xn, b we could refer to table i to see that if we perform both algorithms at the same time in the toy example configuration, we could retrieve the proximity chain a, c, g, b faster with this second level of optimization. in table i we could observe in detail how we retrieve the proximity chains using the two versions of algorithm and the optimization with the toy example's configuration. as stated previously, reverse prox tree(b, a, b, ) was executed way faster than prox tree (a, a, b, ) . indeed, the original algorithm ended after rounds while the reverse one stopped after the th round. since it is not possible to predict which of the two will finish processing first, computing both in parallel will optimize the retrieving. as for the second level of optimization, concatenating two parts of proximity chains allows to retrieve a, c, g, b at round while discovered at round with prox tree and round with reverse prox tree. it is of value especially when proximity chains are composed by a high number of intermediate users. the performance gain obtained with our two levels of optimization is downplayed due to the extreme smallness of logfiles in our toy example. one could easily imagine that applied to real life scenario and big data these optimizations are highly performance saving. for example, in another scenario dealing with mobile connection logfiles [ ] , authors propose to process on these logfiles and therefore bloom filters up to elements. b) algorithm decentralization: the european pepp-pt consortium is advocating a decentralized approach as well as the dp t protocol [ ] which relies on bluetooth, and also as [ ] where decentralization has been investigated. with our presented optimization, we could integrate such construction by introducing two additional parties besides the ones already presented. we precise that these two additional parties should be extremely powerful in terms of computation and perform parallel computing such as server farms or clusters: • computing party which runs prox tree • computing party which runs reverse prox tree this way the agency is only receiving per round the values for xi (from computing party ) and xj (from computing party ) and comparing if xi == xj. only in the case xi == xj we obtain that computing party is sending a, x , . . . , xi and computing party sending xj, . . . , xn, b to the agency. with such a construction, multiple parties are involved in the computation and the whole effort does not rely on the government agency. c) algorithm complexity: one could easily see by analyzing the obtained results in figures and that the size of the resulting tree will depend on the size of the base stations' logfiles. these logfiles will naturally depend on the amount of users and thus connections during the particular time. the more base stations and users there are, the more logfiles will be numerous and fully filled. in our toy example, we have connection entries in all combined base stations as displayed in figure . they result in a tree with respectively and nodes by computing prox tree and reverse prox tree. we also recall that in case we find the final user of the wanted infection chain (user b in our example) in the tree, the algorithm reaches a break instruction and therefore the respective sub-tree is no longer explored. a high activity of this particular user could then reduce the tree's spreading. as seen previously, one of the two algorithms will be faster to execute without being able to predict which one and applying the presented optimization could reduce the complexity to the faster one. from all the proximity chains a, idx , . . . , idx n , b obtained by performing the aforementioned protocol, the government agency should determine if users xi might also be infected. to do so, the agency could estimate the users' probability of being infected and compare it to a threshold (i.e. p r(xi) > t r). such a probability obviously depends, among others, on the respective neighbors within the chain. we consider the probability value computed as a function inf ection(previous node, contact time, contact distance, reproduction number, saturation) where saturation shall denote the percentage of infected persons within the human population of a region, which obviously changes over time. more precisely, in germany the reproduction number r, which is defined as the mean number of people infected by a case, was at the beginning of the covid- crisis and by . . could be reduced to . (and meanwhile r = . ). clearly this number is only an average but still indicates that inference from a proximity chain to an infection chain very much depends on the concrete time and location entities met during the pandemic wave. similar numbers also exist for other countries as for instance r = . for belgium at . . . another important observation is that since a proximity chain can easily build up over a period of weeks, p r(xi) may significantly vary. but only if all probabilities are larger than t r the agency can at least argue having identified a possible infection chain. it goes without saying that it is out of scope to determine the inf ection function. on the one hand, specialists emphasize the high contagiousness of the virus but on the other hand, having two users connecting to the same base station at the same time does not necessarily imply any physical contact between the two. without being able to determine the exact probability of a user to be infected by another one, we could propose a model to evaluate the probability of a proximity chain becoming an infection chain. first, we know that users a and b are infected and we would like to determine if user b has been infected due to user a or via another chain and other infection events. therefore, applying probability theory to such a problem is relevant and reflects the chain characteristic of it. we define as p r(xi) the following conditional probability p (xi|xi− ) of the event "xi− has infected xi knowing that xi− is already infected". it holds that p r(x ∩ · · · ∩ xn) = n i= p r(xi). considering a proximity chain a, x , . . . , xn, b , there is a clear tendency that the overall probability to have user b infected due to user a is inversely proportional to the length of the proximity chain. we propose the following probability model for evaluating a proximity chain: the proximity tree obtained at the previous stage of the protocol contains nodes with users' credentials and only these usernames are revealed. it is only in case a proximity chain turns out to be an infection chain, that the agency will request from the telco company the real identities of the users composing the chain. therefore, users' identity are solely revealed in case of infection function outcomes so. moreover, we recall that during the overall process no additional location information of other users listed in the mobile operator's logfile are revealed to the agency. another way to tune prox tree and make the overall computation more salable could be, during the computation of prox tree and reverse prox tree, to only consider such paths in the proximity tree as long as they still fit the criterion to also be an infection chain. it could consists of having the testing from equation ( ) at line from algorithm and a break instruction in case the test is not fulfilled. one may notice that a trivial optimization would be to switch users a and b in the sense that "infection of user a is coming from user b". in figure we show the proximity tree obtained from our algorithm by computing prox tree(b, b, a, ) with our toy example logfiles. we notice that it results in a very different tree than in figure obtained by prox tree(a, a, b, ) . in case the government agency holds some information on the infection time of users a and b, for example that user a has been infected before user b, only one direction should be considered by the agency. to be the most efficient, the government agency should perform a final step in the protocol. all the users identified as infected at the previous stage (i.e. all xi where p r(xi) > t r) should be considered as new users a and respectively b in the proposed solution. indeed, our protocol is initiated with users tuples (a, b) already identified as infected by the agency. the freshly identified users are thus incrementing the list of known infected persons and the protocol should be applied to them to optimize the search. in such a way, the most infected users could be identified and contacted. we argue that the proposed solution provides privacy for the users by three different means. firstly by using only personal credentials as usernames and secondly thanks to the bloom filter's construction and its obfuscation feature. indeed, as explained previously, the real identities of users are not provided and stored in the bloom filters nor the logfiles. the telco company uses usernames to distinguish users and the private mapping will be provided to the government agency solely on-demand, when a user is identified as being part of an infection chain. the second aspect of location privacy is given by the bloom filters based approach from [ ] which allows to compute relations among logfiles while keeping these data sets private. we recall that such an approach uses an hmac function instead of a bunch of public hash functions and therefore only the telco company could create the bloom filters and no other party. to this extent, the government agency could not try to retrieve locations of a specific user by generating a bloom filter with a unique element and performs the inclusiveness relation between this bloom filter and the ones from base stations. for that reason, using secret keys to generate a valid bloom filter enhances the privacy aspect of the protocol. finally we recall that secret keys are generated and stored only at the telco company side and are not required by the government agency to perform our protocol. the third aspect of location privacy consists of having no other party than the provider itself (which anyhow has this information) gets the location data of the users. this can be easily done by not revealing which bfi comes from which bsi. this way, the only information revealed to the authority is the contact information of users having entered the same cell during the same time interval. providing the concrete location information of this cell is totally irrelevant for the authority to compute the proximity resp. infection chain. we proposed in this work to use the bloom filter approach from [ ] for a real life use case, similarly to [ ] where we applied it to a post-mortem mobile device tracking scenario. our detailed protocol supports a government agency to track possible covid- infection chains and therefore identify plausible infected mobile users. throughout the entire protocol, the agency will only handle usernames which do not allow to retrieve the users' identities and therefore their privacy will be preserved. solely in the case of possible infection by the life-threatening sars-cov- virus, real identities will be revealed to the agency, that will be able to contact them and provide medical support. in such way, the telco companies act gdpr compliant and could still guarantee a certain level of location privacy to their clients. we could stress that if data stem from the 'in proximity' mobile telco's logfile, it means that two devices have been in the same transmission range of a base station. in the worst case they can still have a ×r distance (easily m or more). however, if the same approach can be applied to the rssi based swarm-mapping approach for android or ios collected data then 'in proximity' has a much better accuracy [ ] . in particular also the wifilocationharvest file of each mobile device contains timestamp, latitude, longitude, trip-id, speed, course at an amazing accuracy which comes close to the accuracy required to check if two devices got nearer than m (infection distance). and, moreover, compared to the promoted app based approach with bluetooth from germany fraunhofer institutes and others in the rssi based approach the mobile's wlan and bluetooth can be off, and yet, simply due to the measured rssi from the access point the approach provides the location data of the devices equipped with such modern mobile operating systems. to conclude, our approach may be a good starting point for debating a reasonable gdpr compliant detection of covid- infection chains since we argue it does not provide additional privacyleakage to other parties than those who already have the knowledge of our location data. bias: bluetooth impersonation attacks bluetooth applicationlayer packet-filtering for blueborne attack defending private set relations with bloom filters for outsourced sla validation solving set relations with secure bloom filters keeping cardinality private the european union general data protection regulation: what it is and what it means efficient private matching and set intersection outsourced private set intersection using homomorphic encryption privacy-preserving set operations an efficient bloom filter based solution for multiparty private matching retrospective tracking of suspects in gdpr conform mobile access networks datasets optimizing bloom filter: challenges, solutions, and comparisons secure and efficient authenticated key exchange mechanism for wireless sensor networks and internet of things using bloom filter bloom filter based data collection algorithm for wireless sensor networks privacy-preserving proximity tracing bleeding bit -exposes enterprises access points and unmanaged devices to undetectable chip level attack space/time trade-offs in hash coding with allowable errors centralized or decentralized? the contact tracing dilemma standortlokalisierung in modernen smartphones -grundlagen und aktuelle entwicklungen key: cord- -st ebdah authors: raskar, ramesh; schunemann, isabel; barbar, rachel; vilcans, kristen; gray, jim; vepakomma, praneeth; kapa, suraj; nuzzo, andrea; gupta, rajiv; berke, alex; greenwood, dazza; keegan, christian; kanaparti, shriank; beaudry, robson; stansbury, david; arcila, beatriz botero; kanaparti, rishank; pamplona, vitor; benedetti, francesco m; clough, alina; das, riddhiman; jain, kaushal; louisy, khahlil; nadeau, greg; penrod, steve; rajaee, yasaman; singh, abhishek; storm, greg; werner, john title: apps gone rogue: maintaining personal privacy in an epidemic date: - - journal: nan doi: nan sha: doc_id: cord_uid: st ebdah containment, the key strategy in quickly halting an epidemic, requires rapid identification and quarantine of the infected individuals, determination of whom they have had close contact with in the previous days and weeks, and decontamination of locations the infected individual has visited. achieving containment demands accurate and timely collection of the infected individual's location and contact history. traditionally, this process is labor intensive, susceptible to memory errors, and fraught with privacy concerns. with the recent almost ubiquitous availability of smart phones, many people carry a tool which can be utilized to quickly identify an infected individual's contacts during an epidemic, such as the current novel coronavirus crisis. unfortunately, the very same first-generation contact tracing tools have been used to expand mass surveillance, limit individual freedoms and expose the most private details about individuals. we seek to outline the different technological approaches to mobile-phone based contact-tracing to date and elaborate on the opportunities and the risks that these technologies pose to individuals and societies. we describe advanced security enhancing approaches that can mitigate these risks and describe trade-offs one must make when developing and deploying any mass contact-tracing technology. with this paper, our aim is to continue to grow the conversation regarding contact-tracing for epidemic and pandemic containment and discuss opportunities to advance this space. we invite feedback and discussion. containment, the key strategy in quickly halting an epidemic, requires rapid identification and quarantine of the infected individuals, determination of whom they have had close contact with in the previous days and weeks, and decontamination of locations the infected individual has visited. achieving containment demands accurate and timely collection of the infected individual's location and contact history. traditionally, this process is labor intensive, susceptible to memory errors, and fraught with privacy concerns. with the recent almost ubiquitous availability of smart-phones, many people carry a tool which can be utilized to quickly identify an infected individual's contacts during an epidemic, such as the current novel coronavirus (covid- ) crisis. unfortunately, the very same first-generation contacttracing tools can also be -and have been -used to expand mass surveillance, limit individual freedoms and expose the most private details about individuals. we seek to outline the different technological approaches to mobile-phone based contact-tracing to date and elaborate on the opportunities and the risks that these technologies pose to individuals and societies. we describe advanced security enhancing approaches that can mitigate these risks and describe trade-offs one must make when developing and deploying any mass contact-tracing technology. finally, we express our belief that citizen-centric, privacyfirst solutions that are open source, secure, and decentralized (such as mit private kit: safe paths) represent the nextgeneration of tools for disease containment in an epidemic or a pandemic. with this paper, our aim is to continue to grow the conversation regarding contact-tracing for epidemic and pandemic containment and discuss opportunities to advance this space. we invite feedback and discussion. infectious diseases spread in an exponential fashion. containment is an effective means to slow the spread, allowing health care systems the capacity to treat those infected. however, 'lock down' like containment can also disrupt the productivity of the population, distort the markets (limiting transportation and exchange of goods), and introduce fear and social isolation for those that are not yet infected or that have recovered from an infection. finally, and most importantly, contact tracing can be quickly deployed at the first warnings of an outbreak, but continues to be effective when disease resurgence concerns exist. thus, following an initial epidemic peak, contacttracing can be an effective means to enable disease decline and avoid multiple peak periods and disease resurgence. lessons from china have suggested the utility of understanding gps localization of intersections between known infected individuals and others in stemming infection progression. this is specifically related to the r (r naught) that determines how contagious an infectious disease is. r is a description of the average number of people who will catch a disease from one contagious person. ideally, a lower number will optimize reduction of disease spread, which will facilitate time to develop a vaccine or for the disease to die out. three factors that define r are the infectious period (which is generally fixed for a given disease), the contact rate (i.e., how many people come in contact with a contagious person), and the mode of transmission (which is similarly fixed for a given disease). thus, for a given disease, the most adjustable factor is the contact rate. one key issue with contact rate is how to optimally allow individuals and societies to limit the contact rate. contact amongst uninfected individuals will not facilitate disease spread. thus, ideally a society and/or an individual is principally concerned with understanding the contacts an infected individual has had. understanding if paths have been crossed between an infected individual and any number of other individuals will allow for identifying those who have been exposed (and maybe should be tested resulting in appropriate resource allocation or may isolate themselves in the absence of available testing). thus, at a societal level, this may limit the economic and public impact. with an application that allows for users to understand potential exposure to an infected individual, and appropriate action of the exposed individuals, it may be possible to reduce the contact rate by more rapidly identifying cases/exposures which will remove them from the contact chain. for example, if we assume uptake of an application amongst x% of a population, and assuming that portion of the population responds to known exposure by self-quarantining or pursuing texting to confirm lack of infection, the r will decrease in turn by a multiple of that percentage based on the degree of mixing in the population. the reason for the multiple decrease is r partially depends on the population size and density and the exact number of people an individual may come in contact with after exposure which varies amongst individuals. furthermore, with an increasing number "x" in terms of user base, there will be an exponential decrease in r (e.g., for % use and appropriate action, r would be expected to fall < due to maximal reduction of contact rate). thus, for example, a % uptake will have downstream impacts on individuals that person may have come in contact by more rapid exposure/contact identification. this may eventually disrupt the contact rate with may significantly reduce the r more than is accounted for by the %. this ultimate effect of r with a % use and appropriate response to data will hopefully disrupt ongoing chains of transmission, thus effecting the mortality rate and eventually impacting the contact rate and infection curve. however, high enough utilization could reduce contact rate to such a degree as to make the overall r < which would ideally lead to dying off of the infection entirely. almost half of the world's population carries a device capable of gps tracking. with this capability, location trails-timestamped logs of an individual's location-can be created. by comparing a user's location trails with those from diagnosed carriers of infectious disease, one can identify users who have been in close proximity to the diagnosed carrier and enable contact-tracing. as the covid- outbreak spreads, governments and private actors have developed and deployed various technologies to inform citizens of possible exposure to a pathogen. in the following, we give a brief overview over these technologies. we take this opportunity to define several critical terms used throughout this paper. • users are individuals who have not been diagnosed with an infectious disease who seek to use a contact-tracing tool to better understand their exposure history and risk for disease. • diagnosed carriers then, refers to individuals who have had a confirmatory diagnostic test and are known to have an infectious disease. of note, in the setting of an epidemic in which some infected individuals have mild or no symptoms, a subset of users will in fact be unidentified carriers. an inherent limitation in all containment strategies is the society's ability to identify and confirm disease • location trails refer to the time-stamped list of gps locations of a device, and presumably therefore, the owner of the device. • finally, we broadly speak of the government as the entity which makes location data public and informs those individuals who were likely in close contact with a diagnosed carrier, acknowledging that this responsibility is carried out by a different central actor in every continent, country or local region. • local businesses refer to any private establishment such as shops, restaurants or fitness clubs as well as community institutions like libraries and museums. broadcasting refers to any method, supported by technology, by which governments publicly share locations that diagnosed carriers have visited within the time frame of contagion. governments broadcast these locations through several methods. for example, singapore updates a map with detailed information about each covid- case. south korea sends text messages containing personal information about diagnosed carriers to inform citizens. in the us, nebraska and iowa published information of where diagnosed carriers have been through media outlets and government websites. broadcasting methods can be an easy and fast way for a government to quickly make public this information without the need for any data from other citizens. it requires citizens to access the information provided and evaluate whether they may have come in contact with a diagnosed carrier of a pathogen themselves. however, broadcasting methods risk exposing diagnosed carriers' identities and require exposing the locations with which the diagnosed carrier interacted, making these places, and the businesses occupying them, susceptible to boycott, harassment, and other punitive measures. selective broadcasting releases information about locations that diagnosed carriers have visited to a select group, rather than the general public. for example, information might be selectively broadcast to people within a single region of a country. selective broadcasting requires collection of information, such as a phone number or current location, from users in order to define the selected groups. often, a user must sign up and subscribe to the service, e.g., via a downloaded app. selective broadcasting operates under one of two modes: (i) the broadcaster knows the (approximate) location of the user and sends a location specific message. thus, user location privacy is compromised. (ii) the broadcaster sends a message to all users, but the app displays only the messages relevant to the user's current location. the second approach is typically used when messages are intermittent. katwarn, a german government crisis app that, once downloaded and granted access to location data, notifies users within a defined area of any major event that may impact their safety such as a natural disaster or terrorist attack. user privacy is compromised by apps using the first mode as the broadcasting agent receives information about the user's location. apps using the second mode do not have this same limitation as location data is not reported back to the broadcaster. in addition to the risk to the user's privacy with selective broadcasting, the same risks of identification of the diagnosed carrier and harassment of locations associated with the diagnosed carrier seen with broadcasting apply. further, requiring a user to sign up and subscribe risks decreased participation by possible users. unicasting informs only those users who have been in close contact with a diagnosed carrier. unicasting requires government access data, not only of diagnosed carriers, but also of every citizen who may have crossed their path. the transmission is unique to every user. china developed a unicasting system which shows who poses a risk of contagion. while highly effective at identifying users exposed to contagion for containment interventions, unicasting presents a grave risk for a surveillance state and government abuse. in participatory sharing, diagnosed carriers voluntarily share their location trails with the public without prompting by a central entity, such as a government. advantageously, with participatory sharing, diagnosed carriers retain control of their data and presumably consent to its release. users are required to independently seek the information and assess their own exposure risks. however, these solutions present challenges as it is difficult to check for fraud and abuse. risks exist for both the individual and the public with use of contact-tracing technology. the primary challenge for these technologies, as evident from their deployment in the covid- crisis, remains securing the privacy of individuals, diagnosed carriers of a pathogen, and local businesses visited by diagnosed carriers, while still informing users of potential contacts. additionally, contact-tracing technologies offer opportunities for bad actors to create fear, spread panic, perpetrate fraud, spread misinformation, or establish a surveillance state. all containment strategies require analysis of diagnosed carrier location trails in order to identify other individuals at risk for infection. diagnosed carriers, therefore, are at the greatest risk of their privacy being violated, for example, by public identification. even when personal information is not published, these individuals may be identified by the limited set of location data points released. when identified publicly, diagnosed carriers often face harsh social stigma and persecution. in one example, data sent out by the south korean government to inform residents about the movements of those recently diagnosed with covid- sparked speculations about individuals' personal lives, from rumors of plastic surgery to infidelity and prostitution. online witch hunts aiming to identify diagnosed carriers create an atmosphere of fear. as painfully articulated by the following quote, social stigma can be worse than the disease. with all currently available contact-tracing technologies, the risk for public identification of the diagnosed carrier remains high. further innovation is necessary to protect high risk populations. users also face privacy violations. providing an exposure risk assessment to the user requires the user's location data in order to establish where the user's path has crossed with that of a diagnosed carrier. however, enabling access to contact-tracing technology may, at times, violate the privacy of a non-user. users and non-users are networked together through social relationships and environmental proximity. when a family member or friend's identity as a diagnosed carrier is revealed, non-users close to the diagnosed carrier may endure the same public stigmatization and social repercussions. when a business loses customers or faces harassment due to association with a diagnosed carrier's location trail, its patrons and, particularly, its employees bear the economic and social burden whether or not they are a user of contact-tracing technology. non-users may be further negatively affected if location trails pinpoint sensitive locations, such as military bases and secure research laboratories. obtaining consent for any form of data collection and use helps manage privacy risks. consent's utility in real-world settings, however, is often undermined. language which is incomprehensible for typical users and a lack of real choice (e.g. users must often relinquish privacy and share their data in order to receive a service or opt not to use the service at all) severely limit the power of consent. contact-tracing technologies have yet to overcome the challenges associated with obtaining true consent from the user. typically, a user may be required to share their location with a third party in order to receive an exposure risk assessment. during an epidemic, complex and quickly evolving data must be accurately conveyed to and understood by the entire public, including individuals with low health literacy. serious harm, including heightened alarm among the public, may result from failure to appropriately communicate health risks. contact-tracing technologies have potential to introduce misinformation and cause panic. for example, if users receive an alert about a possible contact location without appropriate information and understanding of the exposure time frame, some users will inaccurately conclude they are at high risk. even when information regarding both location and time is provided to users, if the magnitude of the risk cannot be easily comprehended, an atmosphere of fear or a run on the medical system may be provoked. feeling a false sense of safety at having not received a notification of exposure, some users may underestimate their risk for disease. users who no longer perceive a significant risk may be less likely to engage in other forms of disease prevention, such as social distancing. a false sense of safety may occur when the limitations of contact-tracing technology within a community are not clearly communicated to the public. technological interventions in human crises are often targeted for fraud and abuse. in south korea, fraudsters quickly began blackmailing local merchants and demanding ransoms to not (falsely) report themselves as sick and having visited the business. additionally, bad actors may force individuals to provide their location data for purposes other than disease containment, such as for immigration or police purposes. fear of such abuse may prevent a contact-tracing system meant to help save lives from being adopted. hacking lingers as a serious risk for all data-gathering technologies with sensitive information, like health status and location. hackers have successfully infiltrated apps and services collecting sensitive information before, with million accounts from the genealogy and dna testing service myheritage hacked in . data security must lie at the center of every effort to use location data for contact-tracing and containment. ensuring equity and social justice challenges many technologies, including contact-tracing. if participation requires ownership of a smartphone, some people, often those most vulnerable, the elderly, the homeless, and those living in lower-income countries, will not be able to access the technology. a lack of access to devices among vulnerable populations will remain a significant challenge for contact-tracing technology in the near future. avoidance by the public may impact any business identified on a diagnosed carrier's location trail, but reduced hours or job loss hurt lower-income service workers most. finally, abuse of data collection and violations of user privacy are inflicted more often upon those who are already most vulnerable to government surveillance. in the following table, the various contact-tracing technological approaches are mapped against the reviewed risks and challenges. the inverse relationship between accuracy of the provided risk assessment and user privacy for contact tracing technologies necessitates compromise by the user community. the core trade-off between utility and user privacy, diagrammed below, illustrates this and highlights the potential of private kit: safe paths to fundamentally alter this relationship. deploying any form contact-tracing technology requires contemplation of several risks outlined in the prior analysis. mitigation of these risks depends on thoughtful consideration of the trade-offs inherent to contacttracing technology and containment strategies. in the following, we review decisions required for these trade-offs and best approaches for risk mitigation. data must be collected from diagnosed carriers to facilitate containment of an epidemic. however, both data collection and release of that information to identified contacts may violate the diagnosed carrier's privacy. as the most vulnerable stakeholder in the containment strategy, several efforts must be undertaken to protect the diagnosed carrier's privacy to the highest degree possible. limiting the publicly published data helps protect the known carrier's identity from the public. to date, with the exception of participatory sharing models, the diagnosed carrier's data must be shared with a third-party entity, requiring the carrier to relinquish at least some control over their data. ending the need for third party involvement would represent an immense step forward in privacy protection for diagnosed carriers. access and usage of the data by an entity, mostly governments, should be limited and highly regulated. harsh penalties for the abuse of such data should be established. obtaining true user consent further protects diagnosed carriers. not all approaches in use today require consent to share personal data. particularly in non-democratic regimes, diagnosed carriers may be unable to deny consent. in other instances, all users must consent to share their data in order to be informed of their own exposure risk. we believe no one should be obligated to share their personal information. time limited storage of location trails further protects the privacy of diagnosed carriers. finally, using an open-source approach to create an app fosters trust in the app's privacy protection capabilities, as independent experts and media can access and evaluate the source code. containment of an epidemic requires publication of sites of known exposure to a diagnosed carrier to the public. yet doing so risks harassment of local businesses at these sites. providing broader location data may better protect the privacy of a local business, but also affects the accuracy of the risk assessment. broad location data, such as notice of a x m area into which a diagnosed carrier sojourned, may still identify a business. any contact-tracing approach must balance the public health benefit of disease containment against the threat of economic hardship for local busi-nesses connected to the epidemic. there is no easy answer to this trade-off as any choice impacts utility of the technology and risks affecting the viability of the business. evaluating the risk versus benefit of location data release should occur on a case-by-case basis. the time frame of possible contagion must be released so the users may understand the limits of the exposure risk. critically, the entity publishing the location data should consult with the local business and inform the business of any decision before the public is notified. issues of access and inclusion are not easily resolved by contract-tracing technology. limited access to a device capable of utilizing contact-tracing technology and difficulty understanding and acting on the provided risk assessment overly affect the more marginalized of our societies. however, containing an epidemic outbreak quickly benefits everyone within a community. implementation of contact-tracing technology within a community, even with unequal access, may increase the safety of all. the development of a simple gps device that can share location trails may be a medium-term solution to some accessibility concerns, particularly in countries with limited smartphone penetration. additionally, some form of access to information about a possible contagion must be made available to those without a smartphone and all information should be presented in a way that accounts for variation in health literacy among users. the spread of misinformation cultivates instability and uncertainty during a crisis. release of information on the spread of a pathogen to the public invites public speculation and fear-mongering and manipulation by bad actors. a false sense of safety for users may increase alongside increased efficiency of contact-tracing technology. entities providing contact-tracing technology are also at risk to introduce error within the release information, despite best intentions. at this time, no strategies exist to eliminate these risks; however, such risks can be mitigated through educational outreach efforts and engagement with key stakeholders. storage of sensitive information invites attack by hackers. trade-offs must be made in order to mitigate this risk. only anonymized, redacted, and aggregated sensitive information should be stored. use of a distributed network, rather than a central server, makes hacking less attractive, but requires providing security to multiple sites. in the long term, the safest way to store location data will be in an encrypted database inaccessible to all, including the government. time limitations on data storage also work well to secure information and should be implemented in contact-tracing technology. during an epidemic outbreak, the appropriate amount of time for data storage equals the time during which a diagnosed carrier could have possibly infected another individual. for covid- , this time frame is set to be to days. deleting data after such a short period, particularly during an outbreak of a poorly understood pathogen has risks. however, we feel this trade-off should be made for data security and user privacy. our ability to accurately trace contacts of individuals diagnosed with a pathogen and notify others who may have been exposed has never been greater. real risks exist, though, thus care must be addressed in the design of the solution to prevent abuse and mass surveillance. as a beginning to the discussion of how to develop and deploy contacttracing technologies in a manner which best protects the privacy and data security of its users, we have reviewed varioustechnological methods for contact-tracing and have discussed the risks to both individuals and societies. pri-vatekit: safe paths eliminates the risk of government surveillance. it draws on the advantages from several models of contact-tracing technology while better mitigating the challenges posed by use of such technology. we have presented a discussion of precautions which should be taken and trade-offs which will need to be made. we invite feedback and discussion on this whitepaper. we would like to acknowledge amandeep gill of the international digital health bernardo mariano jr of the world health organization (who), and don rucker of the u.s. department of health and human services (hhs) for their mentorship in advancing contact-tracing solutions contact tracing and disease control emergency guideline: implementation and management of contact tracing for ebola virus disease, world health organization (who) and centers for disease contact tracing in random and clustered networks presumed asymptomatic carrier transmission of covid- contact tracing and epidemics control in social networks covid- : what is next for public health? identification of a new human coronavirus how the painstaking work of contact tracing can slow the spread of an outbreak evaluation of a mobile health approach to tuberculosis contact tracing in botswana a model of the ebola epidemic in west africa with contact tracing innovative technological approach to ebola virus disease outbreak response in nigeria using the open data kit and form hub technology fact sheet more scary than coronavirus': south korea's health alerts expose private lives coronavirus mobile apps are surging in popularity in south korea take a look at these korean apps helping people avoid areas infected by the coronavirus mit techology review [ ] coronavirus privacy: are south korea's alerts too revealing? god's eye view: will global ai empower us or destroy us? tedxbeaconstreet mike and others, a pragmatic introduction to secure multi-party computation split learning for health: distributed deep learning without sharing raw patient data differential privacy: a survey of results, international conference on theory and applications of models of computation a review of homomorphic encryption libraries for secure computation distributed federated learning for ultra-reliable low-latency vehicular communications yves-alexandre, estimating the success of re-identifications in incomplete datasets using generative models unique in the crowd: the privacy bounds of human mobility there's no such thing as anonymous data nextstrain: real-time tracking of pathogen evolution crisis-related apps: assistance for critical and emergency situations multi-method study on distribution, use, and public views on crisis apps collaboration and leadership for effective emergency management introduction to emergency management collaborative emergency management and national emergency management network, disaster prevention and management: an international journal hack of dna website exposes data from million accounts a framework for integrated emergency management key: cord- -lt m h authors: witschel, hans friedrich; riesen, kaspar; grether, loris title: kvgr: a graph-based interface for explorative sequential question answering on heterogeneous information sources date: - - journal: advances in information retrieval doi: . / - - - - _ sha: doc_id: cord_uid: lt m h exploring a knowledge base is often an iterative process: initially vague information needs are refined by interaction. we propose a novel approach for such interaction that supports sequential question answering (sqa) on knowledge graphs. as opposed to previous work, we focus on exploratory settings, which we support with a visual representation of graph structures, helping users to better understand relationships. in addition, our approach keeps track of context – an important challenge in sqa – by allowing users to make their focus explicit via subgraph selection. our results show that the interaction principle is either understood immediately or picked up very quickly – and that the possibility of exploring the information space iteratively is appreciated. today's information repositories are numerous, diverse and often very large. there is an increasing demand for accessing and querying these repositories using questions posed in natural language. while there is a long history of research in the fields of question answering (over both structured and unstructured content) and natural language interfaces to databases (nlidb), as further elaborated in sect. , the field of (complex) sequential question answering [ , ] is still rather new. possibly fuelled by the rise of chatbot technology and the resulting expectations of users, it claims that a more interactive approach to both fields will better meet user needs. its main assumption is that users do not simply ask a question to a knowledge base and then quit. instead, users tend to break down complex questions into a series of simple questions [ ] . in addition, as known from exploratory search [ ] , users who do not have a very clearly articulated information need and/or who aim at getting familiar with a new field of knowledge tend to ask series of questions where one answer triggers the next question. that is, a user might ask a rather "fuzzy" first question (such as "what are important topics in the field of 'information retrieval' ?") and then -when studying the answer -start to think of new questions, concerning some of the new concepts found in that answer. although the concept of exploratory search is well known from the field of information retrieval, this exploratory motivation for performing sequential question answering (over structured knowledge bases) has not been studied so far. in any case, sequential question answering raises the major challenge of keeping track of context: since they assume the context to be known from the prior questions and answers, users tend to leave away sentence elements [ ] . especially in exploratory search settings, answers to fuzzy questions can be very complex, involving a large number of concepts and relations. hence, researchers have proposed various kinds of visualisations in order to aid users in grasping such complexity and studying relationships between concepts [ , ] . in our work, we aim at building a context-aware sequential question answering system, especially suited for exploratory search. to this end, the solution is based on a knowledge graph -which integrates information from various structured and unstructured data sources, see sect. . . since the visualization of graphs provides an intuitive overview of complex structures and relationships [ ] , our system allows users to ask questions in natural language, but provides answers via a visual representation of subgraphs of the underlying knowledge graph. it supports both the user and the system in keeping track of the context/current focus of the search via a novel interaction concept that combines pointing/clicking and asking questions in natural language, described in sect. . . we will show empirically that users appreciate the new interaction concept and its ability to define context and focus graphically, see sect. . both question answering and natural language interfaces to databases (nlidb, see [ ] for a survey) have a long history. they share many characteristics since both support querying of knowledge bases using natural language. many question answering systems retrieve answers from textual (i.e. unstructured) resources, but there are also many approaches based on structured content, often in the form of ontologies [ ] . in nlidb, many challenges have been addressed, e.g. making systems domain-independent [ ] or overcoming specific difficulties with certain query languages, above all sql [ ] . recent advances in this area are relying on sequence-to-sequence models [ , ] , based on encoding and decoding of sequences via deep (reinforcement) learning. an obvious drawback of these supervised learning approaches -as opposed to earlier hand-crafted rule-based grammars -is the amount of training data required. although large hand-annotated datasets have been published [ , ] , trained models cannot be expected to be fully domain-independent. while the fields of question answering (over structured data), semantic parsing and nlidb are obviously quite advanced, researchers have only recently begun to study the domain of "sequential question answering" (sqa). this new focus on interactive, dialog-driven access to knowledge bases is based on the insight that users rarely pose a question to such a knowledge base and then quit [ , ] . instead, a more common and natural access pattern consists in posing a series of questions. most researchers in sqa assume that the motivation for dialogs comes from the need to decompose complex questions into simple ones [ , ] . some researchers propose to perform such decomposition algorithmically [ ] , while others provide evidence that it is more natural and realistic to assume that humans will like to perform this decomposition themselves, resulting in a series of simple, but inter-related questions [ ] . a key challenge in any form of sequential or conversational question answering is the resolution of ellipses (e.g. omissions of arguments in relations) or anaphora which are very frequent in a dialogue where the user expects the system to keep track of the context [ , , ] . these approaches all assume that a searcher always accesses a knowledge base with a clear question in mind. as outlined above, we advocate a wider perspective on sqa, including scenarios of an exploratory nature. in information retrieval, it has been thoroughly accepted that there exist situations in which users are unable to clearly articulate information needs, e.g. when trying to get acquainted with a new field where terminology is still unknown [ ] . thus, users would like to explore, and often their questions become better articulated as they learn more about the new field. in order to support them in grasping relationships between new concepts in the -often very complex -answers to their fuzzy questions, ir researchers have proposed result set visualisations that provide a better overview than the typical ranked lists of document references [ , ] . using visualisations, especially of graphs/ontologies as an output of retrieval systems has also been proposed, mainly in qa and nlidb that are based on knowledge graphs [ , , ] . visualising graph query results is different from visualising graphs in general; the former resembles generation of results snippets in text retrieval [ ] . however, we can learn and employ mechanisms from general approaches to analysing large graphs, e.g. by applying global ranking mechanisms (such as pagerank) or by summarizing properties of selected nodes [ ] . as pointed out in [ ] , visual graph analysis requires, besides the visual representation of graph structures, to have good interaction mechanisms and algorithmic analysis, such as aggregation/merging of nodes, identification of certain graph structures (such as cliques) or node ranking mechanisms such as pagerank. additional challenges originate in the fuzziness of natural language and the potential resulting number of (partially) matching result graphs. graph summarization approaches have been proposed as a solution [ , ] -where summarized/aggregated graph structures play the role of snippets. another approach [ ] uses result previews to narrow down result sets via "early" user interaction. while approaches to semantic parsing, nlidb and question answering over structured data are well studied, there is a recent rise in interest in better studying and supporting the interaction in sequential question answering (sqa) scenarios. however, the emerging field of sqa lacks -in our opinion -a clear idea of why users want to engage in a conversation. we claim that one important motivation can be found in exploratory settings where users need to first gain insights by interacting with a knowledge base, before being able to ask the "right" questions. another challenge in sqa is keeping track of context: in their survey on semantic parsing, kamath & das [ ] mention "adding human in the loop for query refinement" as a promising future research direction in cases where the system is uncertain in its predictions. our contribution consists mainly in proposing a new interaction paradigm which allows users to ask questions in natural language and to receive answers in the form of visualised subgraphs of a knowledge graph. users can then interact with that subgraph to define the focus of their further research, before asking the next question. with this human involvement, we can show empirically both how the human benefits from clarifying the search direction while exploring the knowledge graph and how the machine is supported in understanding incomplete questions better because their context is made explicit. we further use a robust query relaxation approach to trade precision for recall when recall is low. our approach is domain-independent and does not require training data -it only requires a specification of node type names and their possible synonyms. it can be seen as a "traditional" and simple grammar-based approach -the focus is not on sophisticated semantic parsing (we might add e.g. sequence-to-sequence models later), but on the interactive process of graph exploration via natural language. the knowledge graph underlying our experiments was constructed out of a collection of heterogeneous sources and stored in a neo j graph database . for our experiments, we chose books as a domain and aimed at retrieving all information -from various sources -which users (leisure-time readers, students, ...) might find relevant, ranging from core bibliographic information, over authorrelated information (affiliation/prizes won) to reviews and social media coverage of books. to populate it, we implemented a collection of parsers for a variety of data sources. : -for structured data, we built an xml parser (which can be applied to structured xml databases, but also for semi-structured xml files) and an rdf parser. the xml parser was used to integrate a sample of data from the bibliographic platform ipegma , while the rdf parser was applied to the dbpedia sparql endpoint to retrieve data about books, persons, their institutes and awards. the ipegma data covers mostly german books while dbpedia data is focused on english books. -in terms of semi-structured data, our html parser can process web content and a special twitter parser deals with tweets (and uses the html parser to process web pages linked from tweets). we applied the html parser to the websites literaturkritik.de and www.complete-review.com to retrieve book reviews and related book metadata in german and english. the twitter parser was applied to a collection of twitter accounts of major publishers whose timelines were analysed for tweets referring to books. -we also integrated a sentiment analysis service (aylien text api ) as a typical example of analysis of the unstructured part of webpages, i.e. the plain text. in our case, we applied the service to the book reviews from literaturkritik.de to find out whether reviews were positive or negative. for www. complete-review.com, this information could be parsed directly from the web page. in neo j, it is not required to define a schema (i.e. node or relation types) before inserting nodes or relationships. we used this property heavily: each parser has a configuration file in which one can define node and relation types to be extracted. we have developed a special syntax with which one can define the patterns to be searched within the various data sources to retrieve the corresponding data. this means that parsers can be extended to find new types of nodes and relationships and/or cover new data sources of known type, without the need to modify the program code of the parser. typically, the specifications for various data sources have overlapping node types, thus resulting in a data integration task. in order to match identical nodes (e.g. the same book) found in different data sources, the definitions also specify a "uniquneness attribute" (similar to a primary key in relational databases). as a result, the knowledge base consists of a single integrated graph. we have chosen a graph database because graphs are a very natural way of modeling relationships and are easy to visualise and interact with [ ] . as laid out in sect. , most previous work sees sequential question answering as a conversation in which complex questions are broken down into simpler ones. for instance, iyyer et al. [ ] assume that users have already at the initial state of a conversation a complex question in mind -which they then decompose into simpler ones. in contrast, our new interaction concept aims at supporting scenarios that are more exploratory in nature (cf. exploratory search in text retrieval [ ] ). in such settings, users often ask series of questions that emerge one from another -i.e. the answer to a first question triggers the next one etc. -without the final goal of such a conversation being clear initially. we propose a novel interaction mechanism for such an exploratory "conversation", where questions are posed in natural language, but answers are given in the form of subgraph visualisations, with a possibility to interact and select parts of subgraphs for further exploration (again via asking questions). note that it does not play a role whether a user starts from general concepts to "zoom in" to more specific ones or vice versa. in exploratory search, it is typical that -since the nature of the problem is unclear to the user -queries are imprecise or "tentative" [ ] . this implies very often that the answers -much more than the questions or queries -can be quite complex. as pointed out in [ ] , systems that support exploration hence often offer visualisation of search results as well as interaction mechanisms for further exploration. in our case, results are (possibly large) subgraphs of a given knowledge graph. by studying such a subgraph and interacting with it, a user may learn about important concepts and relations in a domain -and this leads to asking the next question(s). a next question may aim at either filtering the current subgraph or further broadening the scope by expanding a subgraph region with further related nodes. the design of our interaction concept was informed by a questionnaire which was filled out by a sample of students. participants received a description of a situation (e.g. having read a good book) and were asked to formulate some questions that they would have in such a situation. we analysed their answers, looking for common patterns of questions and expected result sets. our resulting interaction concept is very simple: based on an initial keyword search or question in step , a user finds an entry point into the graph, i.e. an initial subgraph g . from this point on -provided that the user would like to continue the current session -there are two main possibilities for exploration in each step i: . use the graphical user interface, e.g. expand the subgraph g i by unhiding all nodes related to a chosen node. . select a node or a set of nodes n gi as a "context" and ask a question about it. selection can be done (a) directly via one or more clicks on nodes or (b) by selecting all nodes of a certain type via a button. each interaction leads to a new graph g i+ . while option is not new, option can lead to a new form of sequential question answering, with questions being asked in natural language and answers given as visualisations of subgraphs. this combination is user-friendly since on the one hand -as a basis of all nlidb research and conversational interfacesnatural language is the most natural form of expressing information needs. on the other hand, researchers in both information retrieval [ ] and graph querying [ ] communities use visualisations for improving the user-friendliness of exploratory search. in addition, we claim (and will later show empirically) that, while it is not natural for users to repeat entity names from an earlier question, it is rather natural for them to select preliminary results and thus make context explicit. we will show that such selection is even often helpful for their own understanding of how a question-answer-sequence develops and what they have learned so far/what they want to learn next. since the user specifies the context explicitly when using option , it is easy for our system to fill in missing parts of questions by simply assuming that they originate from that context. figure illustrates the interaction concept with a small "exploration session" from the book domain (see sect. . ). in short, the session consists in a user searching for an author, then demanding to see all books from that author and finally asking which of these books have positive reviews. note how the visualisation of the result graph helps her to get a quick overview of complex structures -for instance to see at a glance which books have many vs. few positive reviews (yellow nodes) in the last result. in order to realise the interaction described in the previous section, kvgr builds several components on top of the knowledge graph (see sect. . ). all of these components are visible on the user interface, the numbers in fig. refer to the corresponding (backend) components in the following enumeration: . fielded keyword search: each node in the knowledge graph is treated as a document and its (textual) attributes as fields. field weights are domainspecific -in the book domain the "title" field of books will have a higher weight than e.g. the "genre" field. the number of shown nodes is limited by applying a cut-off to node scores. . semantic parser, see sect. . . graph visualisation and interaction, allowing common basic graph interactions, plus selecting a context, see sect. . . since semantic parsing is not the core contribution of our work, we have built a simple, but robust grammar for parsing. it takes advantage of the interaction concept and the basic principles of graphs, but makes no further assumptions about the graph schema -it can be adapted easily to new domains simply by providing a lexicon of node types (see below). the grammar consists of jape rules in gate , which annotate occurrences of graph nodes in user utterances, based on a simple lookup mechanism using a lexicon with manually maintained synonyms. each annotation is associated with a number of features, see fig. . the annnotated questions are then passed to a cypher generator, which simply takes all nodes found in an utterance and generates a relationship pattern that is matched against the graph. we illustrate our parser with the example shown in fig. . the parts of the question recognised as nodes are put in bold font, their extracted features are presented in the box above. the grammar has marked "journal" as a return node type and "it" as referring to a current user selection ("this=true"). here, the interaction concept is exploited: because the user has selected a book (let us assume, the book with id ), the system can assume that the pronoun "it" refers to that current selection (the same would apply to a phrase like "this book"). this information is enough for the cypher generator to generate a cypher query as follows: this query, however, will not retrieve anything since the question contains an ellipsis: it should actually be formulated as "which journals have published a review about it?". that is, the system needs to extend the pattern to allow an intermediate node type related to both the current selection and the return type nodes. to this end, we have implemented a query relaxation mechanism which will first try out the above cypher query and then -if nothing is returned -will relax the query by allowing an ellipsis like this: the system does not know/specify that the intermediate node z is of type review -thus a negative impact on retrieval precision might result, which we trade for recall here. in order to evaluate our main hypothesis -namely that our new interaction mechanism effectively supports users in iteratively refining an exploratory search -we performed user tests in an exploratory search scenario. to make the sessions more comparable, we pre-defined the information needs: the "story" started with a keyword search for the topic "criminal law" and was continued with some typical questions about e.g. prominent authors in that field, authors who had won prizes, their institutes, as well as books with positive reviews in that field. before each session, participants were instructed about the features of the system via a short demo. within the session, the predefined information needs were explained and users were asked to interact with the system to satisfy them. when users got stuck with interaction or query formulation, help was offered. following the popular "five-user assumption" of usability testing [ ] , we recruited participants, colleagues from our school of business and of our students. all subjects were not previously aware of our project. this selection was made for practical feasibility reasons -we are aware of the bias, in terms of user characteristics, that it introduces. participants received overall different information needs (q to q ). the first one (q ) started from a single node (the topic "criminal law"), i.e. a context selection was not required. all subsequent ones required participants to select a subset of the nodes that were currently displayed (e.g. all books or all persons). the last information need (q ) was formulated in a complex way ("which authors that have written a book about criminal law have also written a review?") and required participants to recognise that a partial result to the question was already available from a previous step. we observed the participants' difficulties to (a) formulate queries that the semantic parser would understand correctly, (b) grasp the principle of breaking down complex information needs into simpler ones (here, participants would typically try to extend the previous query by adding more constraints) and to (c) remember to select a subset of nodes as a context for their next query. table shows the number of participants facing these problems for each of the test queries. in terms of query reformulation, there is no clear pattern -we observed a number of ways in which our grammar can be improved. grasping the process of iterative refinement shows a clear learning curve: while two participants had understood the principle immediately from the introductory demo, the other three needed only one experience with q to grasp it. we observed that the problems with q resulted merely from participants not accurately understanding the complex question -they both said that it would have been different if it had been their own information need. remembering to select a subset of nodes as a context was harder: while two participants never forgot to do it, one needed q , another one q and q to remember it; one participant could not get used to it until the end of the test. the persons who struggled expressed their expectation that -if they did not select any nodes, but asked a question like "which of these persons..."the system should automatically assume that it referred to all currently visible persons. since this is easy to build into our system, we can conclude that context selection will not be an issue once the principle of iterative refinement has been grasped. besides observing the query formulation and interaction strategies of the users -including their need for help -we asked the users to give us feedback on the following points: -intuitiveness of context selection: three participants stated that they found it intuitive and natural to select a context for their query and to break down complex questions. the other two expressed their expectation for the system to identify context automatically (see above). -results of elliptic queries: queries containing "intermediate nodes", e.g. a query "show me all authors who have written about criminal law" would show not only authors, but also their books, although the question did not ask for books. only one participant had difficulties in understanding what was shown (because the legend was not clear to him). when judging the result, participants said that seeing the books was interesting, especially for someone wishing to explore criminal law as a new area, while participants remarked that the result was not strictly what they had asked for. two participants stated that they would appreciate to see a list of persons -in addition to the graph visualisation. -general feedback on the interaction was very positive. despite the observed difficulties that did occur with query formulation, all participants said that they were impressed with the ability of the system to understand queries in natural language. four participants mentioned explicitly that the visual representation helped them to better understand relationships and to see "how things belong together". one participant said that it sparked his curiosity to explore further. all participants stated that the interaction mechanism was either "intuitive" or at least "easy to learn" (because, as they stated, "the effect of what you do is always visible") and three of them mentioned expressly that they liked the refinement process of breaking down complex queries. participants also came forth with a number of suggestions for improvement: two participants stated that they would appreciate if the system could understand -besides fully formulated questions -keyword-based inputs. the same participants and a third one expressed their wish to have result lists, in addition to a graph. the main reason mentioned for this was the lack of a ranking provided in the graph. the participants said that they would not know where to start looking if a result graph grew too large. -comparison to traditional interfaces, especially ones with list-based result presentation: participants said that our system would be more effective in supporting "detailed investigation" that required to "understand relationships", whereas traditional list-based systems would be better suited to get an overview of e.g. the most important books on criminal law because of their clear ranking. in this work, we have proposed a novel context-aware sequential question answering system, especially suited for exploratory search, based on graph visualisation for result presentation and iterative refinement of information needs. this refinement in turn is based on the selection of subsets of nodes for context definition and natural language questions towards this context. our results are somewhat limited by the specific scenario and use case that we explored and the small user group involved. however, they do show quite clearly that users either understand the principle immediately or pick it up very quickly -and that they appreciate the possibility of exploring the information space iteratively. having to explicitly select context is hard to get used to for some, and should be automated. the visual representation of results was well received for its support of understanding relationships. on the other hand, it became clear that ranking or highlighting the more "relevant" nodes will be needed to help users focus, especially when results get larger. thus, our main goal for future work will be to investigate the best way to incorporate node scoring into the system -either visually (e.g. via node sizes) or by providing ranked result lists in addition to and linked to the graph. because of the limitations of our participant selection strategy, further test with a more varied user group will also be required. finally, it might be interesting to explore the possibility for users to combine search results (sub-graphs) of queries before exploring the combined results further. adaptive visualization for exploratory information retrieval graph querying meets hci: state of the art and future directions dialog-to-action: conversational question answering over a large-scale knowledge base quble: towards blending interactive visual subgraph search queries on large networks search-based neural structured learning for sequential question answering a survey on semantic parsing deep reinforcement learning for sequence to sequence models perseus: an interactive large-scale graph mining and visualization tool incomplete follow-up question resolution using retrieval based sequence to sequence learning how to make a natural language interface to query databases accessible to everyone: an example is question answering fit for the semantic web?: a survey exploratory search: from finding to understanding efficient processing of keyword queries over graph databases for finding effective answers complex sequential question answering: towards learning to converse over linked question answer pairs with a knowledge graph the web as a knowledge-base for answering complex questions fast generation of result snippets in web search an end-to-end neural natural language interface for databases refining the test phase of usability evaluation: how many subjects is enough visual analysis of large graphs: state-of-the-art and future research challenges exploratory search interfaces: categorization summarizing answer graphs induced by keyword queries sqlnet: generating structured queries from natural language without reinforcement learning slq: a user-friendly graph querying system spider: a large-scale human-labeled dataset for complex and crossdomain semantic parsing and text-to-sql task seq sql: generating structured queries from natural language using reinforcement learning key: cord- -uy dykhg authors: albanese, federico; lombardi, leandro; feuerstein, esteban; balenzuela, pablo title: predicting shifting individuals using text mining and graph machine learning on twitter date: - - journal: nan doi: nan sha: doc_id: cord_uid: uy dykhg the formation of majorities in public discussions often depends on individuals who shift their opinion over time. the detection and characterization of these type of individuals is therefore extremely important for political analysis of social networks. in this paper, we study changes in individual's affiliations on twitter using natural language processing techniques and graph machine learning algorithms. in particular, we collected million twitter messages from . million users and constructed the retweet networks. we identified communities with explicit political orientation and topics of discussion associated to them which provide the topological representation of the political map on twitter in the analyzed periods. with that data, we present a machine learning framework for social media users classification which efficiently detects"shifting users"(i.e. users that may change their affiliation over time). moreover, this machine learning framework allows us to identify not only which topics are more persuasive (using low dimensional topic embedding), but also which individuals are more likely to change their affiliation given their topological properties in a twitter graph. technologically mediated social networks flourished as a social phenomenon at the beginning of this century with exponents such as friendster ( ) or myspace ( ) [ ] but other popular websites soon took their place. twitter is an online platform where news or data can reach millions of users in a matter of minutes [ ] . twitter is also of great academic interest, since individuals voluntarily express openly their opinions and they can interact with other users by retweeting the others' tweets. in particular, in the last decade there has been an increase in interest from computational social scientists and numerous political studies have been published using information from this platform [ ] [ ] [ ] [ ] [ ] [ ] . previous works applied different machine learning models to these datasets. xu et al. collected tweets using the streaming api and implemented an unsupervised machine learning framework for detecting online wildlife trafficking using topic modeling [ ] . kurnaz et al. proposed a methodology which first extracts features of a tweet text and then applies deep sparse autoencoders in order to classify the sentiment of tweets [ ] . pinto et al. detected and analyzed the topics of discussion in the text of tweets and news articles, using non negative matrix factorization [ ] , in order to understand the role of mass media in the formation of public opinion [ ] . on the other hand, kannangara implemented a probabilistic method so as to identify the topic, sentiment and political orientation of tweets [ ] . some other works are focused in political analysis and the interaction between users, as for instance the one of aruguete et al., which described how twitter users frame political events by sharing content exclusively with likeminded users forming two well-defined communities [ ] . dang-xuan et al. downloaded tweets during the parliament elections in germany and characterize the role of influencers utilizing the retweet network [ ] . stewart et al. used community detection algorithms over a network of retweets to understand the behavior of trolls in the context of the #blacklivesmatter movement [ ] . conver et al. [ ] also used similar techniques over a retweets network and showed the segregated partisan structure with extremely limited connection between clusters of users with different political ideologies during the u.s. congressional midterm elections. the same polarization on the twitter network can be found in other contexts and countries (canada [ ] , egypt [ ] , venezuela [ ] ). opinion shifts in group discussions have been studied from different points of view. in particular, it was stated that opinion shifts can be produced by arguments interchange, according to the persuasive arguments theory (pat) [ , , ] . primario et al. applied this theory to measure the evolution of the political polarization on twitter during the us presidential election [ ] . in the same line, holthoefer et al analyzed the egyptian polarization dynamics on twitter [ ] . they classified the tweets in two groups (pro/anti military intervention) based on their text and estimated the overall proportion of users that change their position. these works analyzed the macro dynamics of polarization, rather than focus on the individuals. in contrast, we found it interesting not only to characterize the twitter users who change their political opinion, but also predict these "shifting voters". therefore, the focus of this paper is centered on the individuals rather than the aggregated dynamic, using machine learning algorithms. moreover, once we were able to correctly determine these users, we seek to distinguish between persuasive and non persuasive topics . in this paper, we examined three twitter networks datasets constructed with tweets from: argentina parliamentary elections, argentina presidential elections and tweets of donald trump. three datasets were constructed and used in order to show that the methodology can be easily generalized to different scenarios. for each dataset, we analyzed two different time periods and identify the larger communities corresponding to the main political forces. using graph topological information and detecting topics of discussion of the first network, we built and trained a model that effectively predicts when an individual will change his/her community over time, identifying persuasive topics and relevant features of the shifting users. our main contributions are the following: . we described a generalized machine learning framework for social media users classification, in particular, for detecting their affiliation at a given time and whether the user will change it in the future. this framework includes natural language processing techniques and graph machine learning algorithms in order to describe the features of an individual. . we observed that the proposed machine learning model has a good performance for the task of predicting changes of the user's affiliation over time. . we experimentally analyzed the machine learning framework by performing a feature importance analysis. while previous works used text, twitter profiles and some twitting behavior characteristics to automatically classify users with machine learning [ ] [ ] [ ] [ ] , here we showed the value of adding graph features in order to identify the label of a user. in particular, the importance of the "pagerank" for this specific task. . we also identified the topics that are considerably more relevant and persuasive to the shifting users. identifying this key topics has a valuable impact for social science and politics. the paper is organized as follows. in the data collection section, we describe the data used in the study. in the methods section, we describe the graph unsupervised learning algorithms and other graph metrics that were used, the natural language processing tools applied to the tweets and the machine learning model. in the results section, we analyze the performance of the model for the task of detecting shifting individuals. finally, we interpret these results in the conclusions section. the code is in github (omitted for anonymity reasons). twitter has several apis available to developers. among them is the streaming api that allows the developer to download in real time a sample of tweets that are uploaded to the social network filtering it by language, terms, hashtags, etc. [ , ] . the data is composed of the tweet id, the text, the date and time of the tweet, the user id and username, among other features. in case of a retweet, it has also the information of the original tweet's user account. persuasive) for the topics relevant (resp. non relevant) to those individuals for this research, we collected datasets: argentina parliamentary elections ( arg), argentina presidential elections ( arg) and united states tweets of donald trump ( us). for the argentinan datasets, the streaming api was used during the week before the primary elections and the week before the general elections took place. keywords were chosen according the four main political parties present in the elections. details and context can be found in the appendix. for the us dataset, "realdonaldtrump" (the official account of president donald trump) was used as keyword. twitter messages are in the public domain and only public tweets filtered by the twitter api were collected for this work. for the purpose of this research, we have analyzed more than million tweets and more than . million individuals in total. the specific start and end collection date, the total number of tweets and users can be seen in table . in this section, we will introduce the methodology used to characterize the twitter users. first the retweet networks (section . ) and the algorithm to find communities (section . ). then, the different metrics which describe the interaction networks among them (section . ). after that, the features obtained by analyzing the text of the tweets (section . ). finally, we describe the supervised learning model which uses the individual's characteristics as instances and predicts the shifting users. we represent the interaction among individuals in terms of a graph, where users are nodes and retweets between them (one or more) are edges (undirected and unweighted). isolated nodes (never retweeting nor retweeted) were not taken into account for this analysis. in figure , we can visualize the retweet network for each time period and dataset. in the case of the us dataset, most of the users are concentrated in two groups, which allows to visualize the political polarization. on the other hand, in the argentinean datasets we can identify two large groups and also some smaller ones. the graph visualizations are produced with force atlas layout using gephi software [ ]. in a given graph, a community is a set of nodes largely connected among them and with little or no connection with nodes of other communities [ ] . we implement an algorithm to detect communities in large networks which allows us to characterize the users by their relationship with other users. in this context, the modularity is defined as the fraction of the edges that fall within a given community minus the expected fraction if edges were distributed at random [ ] . the louvain method for community detection [ ] seeks to maximize modularity by using a greedy optimization algorithm. this method was chosen to perform the analysis due to the characteristics of the database. while other algorithms such as label propagation are good for large data networks, their performance decreases if clusters are not well defined [ ] . in contrast, in these cases the louvain or infomap methods obtain better results. however, given that the number of nodes is in the order of hundreds of thousands and edges in the order of one million, the louvain method has a better performance [ ] than other ones. despite having found several communities, we just considered the largest for each case. for the arg and arg dataset we used the four biggest communities because, when examining the text of the tweets and the users with the highest degree, each one had a clear political orientation corresponding to the four biggest political parties in the election. these communities are labeled as "cambiemos", "unidad ciudadana", "partido justicialista" and " pais" for arg and "frente de todos", "juntos por el cambio", "consenso federal" and "frente de izquierda-unidad" for arg (electoral context is provided in the appendix). regarding the us dataset, we used the biggest communities because of the bipartisan political system of the united states (republicans and democrats) and the clear structure present in the retweet networks, where only two big clusters concentrate almost all of the users and interactions (see figure ). in contrast, the argentinean election datasets have two principal communities and some minor communities as well. considering the the fact that our dataset has more than million tweets and more than . million users, it was not feasible to determine true labels of political identification of the users for this task. neither it was viable to manually assign them. therefore, we decided to use the communities labels of the retweet network as a proxy of political membership, and interpret changes in their label as changes in affiliation over time. this decision is supported by previous literature, where it is shown that communities identify a user's ideology and political membership [ , , , , , ] . moreover, taking into account the stochasticity of the louvain method and following [ ] , we decided to use for the machine learning task only the nodes that were always assigned to the same community, in order to minimize the possibility of an incorrect labeling. additionally, we did not used individuals with less than retweets, since we might have insufficient data to correctly classify them. finally we also manually sampled and checked users from different communities to verify their political identification. with the intention of characterizing topologically the users of the primary election network, we computed the following metrics: degree of each user in the network (i.e., the number of users that have retweeted a given one), pagerank [ ], betweenness centrality [ ], clustering coefficient [ ] and cluster affiliation (the community detected by the louvain method). we used all these metrics as features in the machine learning classification task. in order to determine the topics of discussion during the primary election, we analyzed the text of the tweets using natural language processing analysis and we calculated a low dimensional embedding for each user. the tweets were described as vectors through the term frequency -inverse document frequency (tf-idf) representation [ ] . each value in the vector corresponded to the frequency of a word in the tweet (the term frequency, tf ) weighted by a factor which measures the degree of specificity (inverse document frequency, idf ). we used -grams and a modified stop-words dictionary that not only contained articles, prepositions, pronouns and some verbs but also the names of the candidates, parties and words like "election". then, we constructed a matrix m concatenating the tf-idf vectors, with dimensions the number of tweets times the number of terms. we performed topic decomposition using non-negative matrix factorization (nmf) [ ] on the matrix m . nmf is an unsupervised topic model which factorizes the matrix m into two matrices h and w with the property that all three matrices have no negative elements. we selected the nmf algorithm because this non-negativity makes the resulting matrices easier to inspect and to understand their meaning. the matrix h has a representation of the tweets in the topic space, in which the columns are the degree of membership of each tweet to a given topic. on the other hand, the matrix w provides the combination of terms which describes each topic [ ] . the obtained results, analyzing just the tweets corresponding to the first time period, are detailed in the appendix. the decomposition dimension was swept between and , and for each dataset we chose a number of topics in the corpus so as to have a clear interpretation of each one. the same methodology was used and described in [ , ] . once we collected all this information, twitter users were also characterized by a vector of features where each cell corresponds to one of the topics and its value to the percentage of tweets the user tweeted with that topic. given that our objective was to identify shifting individuals and persuasive arguments, we implemented a predictive model whose instances are the twitter users who were active during both time periods [ ] and belonged to one of the biggest communities in both time periods networks. consequently, the number of users used at this stage was reduced. individuals were characterized by a feature vector with components corresponding to the mentioned topological metrics depicted in section . and others corresponding to the percentage of tweets in each one of the topics extracted in section . . the information used to construct these embedding was gathered from the whole first time period retweet network. the target was a binary vector that takes the value if the user changed communities between the first and the second time periods and otherwise. the summary of the datasets is shown in table . considering the percentage of positive targets, this is clearly a class imbalance scenario. specially in us, which is reasonable given the bipartisan retweet network with big and opposed communities [ ] . the gradient boosting technique uses an ensemble of predictive models to perform the task of supervised classification and regression [ ] . these predictive models are then optimized iteration by iteration using the gradient of the cost function of the previous iteration. in this scenario, xgboost, a particular implementation of this technique, has proven to be efficient in a wide variety of supervised scenarios outperforming previous models [ ] . we used a / random split between train and test. in order to do hyperparameter tuning, we used the randomized search method [ ] over the training dataset with -fold cross-validation, which consists of trying different random combinations of parameters and then staying with the optimum. with the objective of measuring the efficiency and performance of our machine learning model, two other models, namely random and polar, were taken as baselines for comparison. in the former one, the selected user will change of community with a probability of %. in the latter, for a user that belongs to one of the two biggest communities in the network, we predict that he/she will stay in that community, while a user that belongs to a smaller community will change to one of the two main communities with same probability. this polar model is inspired by idea that in a polarized election, members of the smallest communities shift and are attracted to the biggest communities, and was used in the argentinean datasets. we trained three different gradient boosting models for each dataset: the first one was trained only with the features obtained via text mining (how many tweets of the selected topics the user talks about); a second one was trained just with features obtained through complex network analysis (degree, pagerank, betweeness centrality, clustering coefficient and cluster affiliation); and the last one was trained with all the data. in this way, we could compare the importance natural language processing and complex network analysis for this task. in figure we can see the roc [ ] of the different models for each dataset. the best performance is obtained in all cases by the machine learning model built with all the characteristics of the users, which is able to efficiently predict which users are shifting individuals. this result is expected, since an assembly of models manages to have sufficient depth and robustness to understand the network information, the topics of the tweets and the graph characteristics of the users. we performed random permutation of the features values among users in order to understand which of them are the most important in the performance of our model (the so-called permutation feature importance algorithm [ ] ). in figure , we observe that the most important feature in all cases corresponds to the node's connectivity: pagerank, meaning that shifting individuals are the peripheral and least important nodes of big communities. the result is verifiable when comparing the pagerank averages in users who changed their affiliation ( arg pr = . e − , arg pr = . e − and us pr = . e − ) with those who did not ( arg pr = . e − , arg pr = . e − and us pr = . e − ), the latter being at least % higher. this is also consistent with the fact that the model trained with network features gets a better au c than the model trained with the texts of user tweets in all datasets. previous works have used text, twitter profile and some twitting behavior characteristics to automatically classify users with machine learning, but none of them have incorporated the use of these graph metrics [ ] [ ] [ ] [ ] ] . our work shows the importance of also including these graph features in order to identify shifting individuals. this result has a relevant sociological meaning: the unpopular individuals are more prone to change their opinion. besides the importance of the mentioned topological properties, some discussed topics are also relevant to the classifier model. a simple analysis of the most spoken topics in the network does not differentiate between topics discussed by a shifting individual and other users. considering that most users do not change their affiliation, it is interesting to analyze those that do change. the persuasive arguments theory affirms that changes in opinion occurs when people exchange strong (or persuasive) arguments [ , , ] . consequently, we defined a "persuasive topic" as a topic used primarily by shifting individuals and not used by non shifting individuals. with the intention of doing a deeper analysis of the topic embedding for the arg dataset, we first enumerate the main topics in that corpus: equivalent analysis can be done with the other two corpora and the topic decomposition for each can be found in the appendix. in figure , the most important topics for the classifier are "venezuela", "economy" and "santiago maldonado". we can contextualize these results by looking which are the main topics discussed in each community as well the ones discussed among the users that change between them, as it is shown in figure . we can see that "venezuela" is one of the most discussed topics in the people remaining in four communities and "santiago maldonado" is a relevant topic in the communities "unidad ciudadana" and " pais". when we look at the main topics discussed by users that change their communities between elections, we can observe that "venezuela" identifies those that go from "partido justicialista (pj)" to " pais" and "cambiemos" meanwhile "santiago maldonado" is a key topic among those who arrive to "unidad ciudadana" from "partido justicialista (pj)" and " pais". considering that these topics are considerably more used by the shifting twitter users than by the other users, it can be affirmed that these are "persuadable topics". in contrast, other topics such as "economy" or "santa cruz" were also commonly used by most of the users but not by the shifting individuals. in this paper we presented a machine learning framework approach in order to identify shifting individuals and persuasive topics that, unlike previous works, focused on the persuadable users rather than studying the political polarization on social media as a whole. the framework includes natural language processing techniques and graph machine learning algorithms in order to describe the features of an individual. also, three datasets were used for the experimentation: arg, arg and us. these dataset were constructed with tweets from countries, during different political contexts (during a parliamentary election, during a presidential election and during a non-election period) and in a multi-party system and a two-party system. the machine learning framework was applied to these different datasets with similar results, showing that the methodology can be easily generalized. the implemented predictive models effectively detected whether the user will change his/her political affiliation. we showed that the better performance can be achieved when representing the individuals with their community and other graph features rather than topic embedding. therefore, our results indicate that these proposed features do a reasonable job at identifying user characteristics that determine if a user changes opinion, features that were neglected in previous works of user classification on twitter [ ] [ ] [ ] [ ] ] . in particular, the pagerank was the most relevant according to the permutation feature importance analysis in all datasets, showing that popular people have lower tendencies to change their opinion. finally, the proposed framework also identifies which of the topics are the persuasive topics and good predictors of individuals changing their political affiliation. consequently, this methodology could be useful for a political party to see which issues should be prioritized in their agenda with the intention of maximizing the number of individuals that migrate to their community. understanding the characteristics and the topics of interest of politically shifting individuals in a polarized environment can provide an enormous benefit for social scientists and political parties. the implications of this research supplement them with tools to improve their understanding of shifting individuals and their behavior. the percentage on the arrows are the percentage of users that changed from one community to the other (when the percentage was less than %, the corresponding arrow is not drawn). the topics on the arrows show the most important topics among the users that change between those communities. • the president of argentina and the governor of the province of buenos aires at the time of elections (i.e., "mauriciomacri", "macri" and "mariuvidal"). these last two were added, despite not being actively present in the lists, due to their political importance, their relevance and participation during the campaign. in addition, the tweets were restricted to be in spanish. the electoral context is the following: former president and opposition leader cristina fernández de kirchner (former "unidad ciudadana") and sergio massa (former " pais") create a new party "frente de todos" with alberto fernández as candidate for president. on the other hand mauricio macri (former "cambiemos") run for reelection as candidate of "juntos por el cambio". the socialist nicolas del cao of "frente de izquierda-unidad" and roberto lavagna of "consenso federal" were also candidates for president, among others. considering the previous subsection and the candidates for the senate, for deputy and for governor, the following terms were chosen as keywords for tweeter: "elisacarrio", "ofefernandez ", "patobullrich", "macri", "macrismo", "mauriciomacri", "pichetto", "miguelpichetto", "juntosporelcambio", "alferdez", "cfkargentina", "cfk", "kirchner", "kirchnerismo", "frentetodos", "frentedetodos", "lavagna", "rlavagna", "urtubey", "urtubeyjm", "consensofederal", " consensofederal", "delcao", "nicolasdelcano", "delpla", "rominadelpla", "fitunidad", "fdeizquierda", "fte izquierda", "castaeira", "manuelac ", "mulhall", "nuevomas", "espert", "jlespert", "frentedespertar", "centurion", "juanjomalvinas", "hotton", "cynthiahotton", "biondini", "venturino", "frentepatriota", "romeroferis", "partidoautonomistanacional", "vidal", "mariuvidal", "kicillof", "kicillofok", "bucca", "buccabali", "chipicastillo", "larreta", "horaciorlarreta", "lammens", "matiaslammens", "tombolini", "matiastombolini", "solano", "solanopo", "lousteau", "gugalusto", "recalde", "marianorecalde", "ramiromarra", "maxiferraro", "fernandosolanas", "marcolavagna", "myriambregman", "cristianritondo", "massa", "sergiomassa", "gracielacamano", "nestorpitrola". in addition, the tweets were restricted to be in spanish. also, the topic embedding obtained with non-negative matrix factorization: c tweets of donald trump the following term was used as keyword for the tweeter api: "realdonaldtrump". in addition, the tweets were restricted to be in english. lon from friendster to myspace to facebook: the evolution and deaths of social networks longislandpress garcí emotions in health tweets: analysis of american government what the hashtag? a content analysis of canadian politics on twitter information, communication & society linh political communication and influence through microblogging-an empirical analysis of sentiment in analyzing the digital traces of political manipulation: the russian interference twitter campaign politics, sentiments, and misinformation: an analysis of the twitter discussion on the mauricio interest communities and flow roles in directed networks: the twitter network of the uk riots journal of the royal society interface donald j. trump and the politics of debasement critical studies in media communication using machine learning to detect cyberbullying ahme sentiment analysis in data of twitter using pablo quantifying time-dependent media agenda and public opinion by topic modeling physica a: statistical mechanics and its applications a scalable tree boosting system proceedings of the nd acm sigkdd international conference on knowledge discovery and data mining didrik tree boosting with xgboost-why does xgboost winëverymachine learning competition? random search for hyper-parameter optimization comparing effect sizes in follow-up studies: roc area, cohen's d, and r law and human behavior permutation importance: a corrected feature importance measure jonny identifying communicator roles in twitter proceedings of the st international conference on world wide web use of machine learning to detect illegal wildlife product promotion and sales on twitter frontiers in big data analyzing mass media influence using natural language processing and time series analysis michael quantifying controversy in social media proceedings of the ninth acm international conference on web search and data mining sandeepa mining twitter for fine-grained political opinion polarity classification, ideology detection and sarcasm detection proceedings of the eleventh acm international conference on web search and data mining consensus clustering in complex networks scientific reports measuring polarization in twitter enabled in online political conversation: the case of us presidential election judgments and group discussion: effect of presentation and memory factors on polarization sociometry why do humans reason? arguments for an argumentative theory persuasive arguments theory, group polarization, and choice shifts personality and ingmar content and network dynamics behind egyptian political polarization on twitter proceedings of the th acm conference on computer supported cooperative work & social computing measuring political polarization: twitter shows the two sides of venezuela chaos jeffrey investigating political polarization on twitter: a canadian perspective policy & internet testing two classes of theories about group induced shifts in individual choice sergio massa of " pais" (former chief of the cabinet of ministers of cristina kirchner, then leader of the opposition against cristina kirchner in when he won his provincial election) and florencio randazzo of twitter keywords considering the previous subsection, the following terms were chosen as keywords for tweeter • candidates for senate of the main four parties: their name and official user on twitter topic decomposition the topic embedding obtained with non-negative matrix factorization: . president donald trump: the o president of the united states. . obamagate: the accusation that barack obama is conspiring against donald trump world health organization: president trump announcing the us will pull out of the world health organization thank you: individuals thanking president trump for this actions in regard to the covid- pandemic fake news: individuals discussing and claiming that certain news are fake president barack obama: the o president of the united states and his administration key: cord- -t lwqrpb authors: whaiduzzaman, md; hossain, md. razon; shovon, ahmedur rahman; roy, shanto; laszka, aron; buyya, rajkumar; barros, alistair title: a privacy-preserving mobile and fog computing framework to trace and prevent covid- community transmission date: - - journal: nan doi: nan sha: doc_id: cord_uid: t lwqrpb to slow down the spread of covid- , governments around the world are trying to identify infected people and to contain the virus by enforcing isolation and quarantine. however, it is difficult to trace people who came into contact with an infected person, which causes widespread community transmission and mass infection. to address this problem, we develop an e-government privacy preserving mobile and fog computing framework entitled ppmf that can trace infected and suspected cases nationwide. we use personal mobile devices with contact tracing app and two types of stationary fog nodes, named automatic risk checkers (arc) and suspected user data uploader node (sudun), to trace community transmission alongside maintaining user data privacy. each user's mobile device receives a unique encrypted reference code (uerc) when registering on the central application. the mobile device and the central application both generate rotational unique encrypted reference code (ruerc), which broadcasted using the bluetooth low energy (ble) technology. the arcs are placed at the entry points of buildings, which can immediately detect if there are positive or suspected cases nearby. if any confirmed case is found, the arcs broadcast pre-cautionary messages to nearby people without revealing the identity of the infected person. the suduns are placed at the health centers that report test results to the central cloud application. the reported data is later used to map between infected and suspected cases. therefore, using our proposed ppmf framework, governments can let organizations continue their economic activities without complete lockdown. t he novel corona virus disease in (covid- ) has spread rapidly worldwide in a short duration. it caused a significant public health crisis worldwide, and by the end of may (i.e., within the five months of its first infection detection), over . million persons were infected, and over thousand have died [ ] . therefore, governments worldwide seek solutions to minimize the infected cases from the covid- pandemic by employing mobile application md whaiduzzaman, and alistair barros are with queensland university of technology, queensland, australia (e-mail: wzaman@juniv.edu, alistair.barros@qut.edu.au). md. razon hossain, and ahmedur rahman shovon are with jahangirnagar university, dhaka, bangladesh (e-mail: hossainmdrazon@gmail.com, shovon.sylhet@gmail.com). shanto roy, and aron laszka are with university of houston, tx, usa (e-mail: shantoroy@ieee.org, laszka.aron@gmail.com). rajkumar buyya is with university of melbourne, australia (e-mail: rbuyya@unimelb.edu.au). based contact tracing [ ] , [ ] . mobile apps can help trace both infected and suspected cases in almost real-time, and governments are rushing towards developing and deploying such applications and frameworks. however, several applications raise significant privacy issues as they collect sensitive and personally-identifiable data from users, and lack user control and transparency in data processing or usage [ ] - [ ] . governments are enforcing temporary lockdowns of cities to slow down the spread of covid- , causing tremendous economic losses. however, we can alleviate economic impact by avoiding wide-scale lockdowns and performing more targeted isolation. therefore, introducing fog computing in economic zones (e.g., shopping malls, organization buildings) can ensure continued economic activities by alerting nearby people while mobile computing (mobile apps) can help to trace the infected and suspected cases. however, to the best of our knowledge, there is no integrated fog computing framework alongside contact tracing mobile apps that allows tracing community transmission while preserving users' data privacy. therefore, we introduce the following research questions to find an appropriate solution to the cause. q . background and issues: what are the issues and privacy concerns in existing contact tracing apps? q . mobile and fog computing: how to utilize mobile and fog computing to trace and prevent covid- community transmission? q . privacy-preserving framework: how to develop an automated privacy-preserving e-government framework? we answer the first question by looking into the background and issues of existing application frameworks as well as user data privacy concerns (section ii). we find that there are several mobile application frameworks developed by governments and third parties to trace the covid- community transmission. however, most of these applications and frameworks have failed to ensure user data privacy and suffer from other issues, such as mandatory use of apps, excessive data gathering, questionable transparency of source codes and data flow, unnecessary data usage or processing, and lack of user control in data deletion. we answer the second question by presenting the design considerations, architecture, and workflow of our egovernment application framework that utilizes mobile and fog computing (section iii). the system consists of two types of , several restful apis, and a central application. users can register themselves using user api. the arc is used to check the risk of users visiting any public places (e.g., shopping mall, organization building). test centers send the covid- test results using result api to the central application. if the test result is positive, then the user is requested to upload locally stored contact tracing information to the cloud using sudun or directly from mobile app. figure presents an overview of the proposed system. we answer the third question by discussing the implementation overview and user data privacy solutions in our framework (section iv). here, we discuss the framework development based on the amazon web services (aws) solutions. then we present the privacy preservation based on user control (voluntary, compliance, and user consent), minimal data collection (mobile number, postal code, and age group), data destruction at user's will, transparency (open source codes, clean data flow), and limited further usage of data. scope: we intend our framework to be deployed and controlled by the central government of a country. governments have access to the test results and can control such an integrated mobile-fog computing framework. moreover, relying on a private entity to manage such a framework can limit the preservation of data privacy. additionally, in this work, we primarily consider and focus on user data privacy issues. regardless of building a standard and secure data processing framework, we do not discuss advanced security threats related to mobile, fog, and cloud layer as there are rich literature on existing security measures [ ] - [ ] . outline: the rest of the paper is organized as follows: section ii introduces the privacy and general issues of existing contact tracing frameworks. section iii discusses the design considerations, system components, and workflow of our proposed e-government framework. section iv presents the implementation overview and privacy-oriented solutions of the framework. finally, section v discusses a few related privacy-preserving frameworks followed by a conclusion. contact tracing and isolation of cases are required to control infectious disease outbreaks, and the consequence depends mainly on government actions and citizen responses [ ] . manual tracing is difficult and time-consuming in a pandemic situation, and governments are pushing for digital surveillance to contain the spread of the virus in the context of health informatics [ ] . however, these mobile apps have raised questions over user data privacy since digital surveillance involves location tracking, limits individual freedom, and expose confidential data [ ] . governments can stop transmission of covid- if they identify cases and their contacts quickly and get them to limit their connections with other people. cases should isolate themselves as long as they are infectious-for at least days after they became ill. contacts must quarantine for days after the last contact with an infectious patient [ ] . some cases may have close contact with many people because of where they have been or where they live, and these situations should immediately be reported. contact tracing might be difficult as an infected case may not remember all people who came in their contacts. additionally, an infected person may not know or remember the phone numbers and address of their contacts. it may take longer for authorities as well due to the required time to identify and get in touch with contacts. therefore, mobile-based contact tracing might help track several contacts and determine who is at highest risk for infection. contact tracing usually requires three primary steps: contact identification, listing, and follow-up [ ] . people with contact with an infected person are considered as suspected cases if they were in proximal range for more than minutes, within feet [ ] or the distance of more than feet, but stayed nearby for an hour [ ] . due to the necessity of tracing infected or suspected cases, governments around the world have developed several tracing applications and frameworks [ ] . the most common features of these applications are live maps and news updates of confirmed cases, location-based tracking and alerts, quarantine and isolation monitoring, direct or indirect reporting, selfassessment, and covid- education [ ] . some governments involved third parties to develop such applications and encouraged citizens to use these applications. ) privacy concerns: since governments have been rushing to build tracing applications, the least they have considered about user data privacy. in most cases, users can be monitored and tracked in real-time without users consent. if such mobile applications store the location history, user movement can be traced as well [ ] . apart from location tracking, there are several other user privacy issues, such as excessive data collection, obscure data flow, lack of user control, and data usage policies. abeler et al. suggested that we can achieve contact tracing and data protection at the same time by minimizing data processing in the existing frameworks [ ] . however, many applications are not following such solutions and howell et al. has suggested five primary privacy concerns for covid- application frameworks [ ] . • voluntary or mandatory: it should be a voluntary act whether users download and use such tracing apps. with the growing concern over data privacy, unnecessary data collection, location tracking, and other issues, users must have free wills to decide. the government or any third party cannot mandate users to use these apps in any circumstances. • data usage limitation: people are concerned over the collected data usage for personal safety reasons [ ] . therefore, the collected data must have usage limitations. for example, tracing data can only be used for public health and safety. traced data cannot be used for any other purpose, e.g., law enforcement. • data destruction: mobile applications or a framework should automatically delete user records after a particular period [ ] (e.g., usually - days and no longer than days). otherwise, users should have manual control over data deletion from the app or the central server. • minimal data collection: several applications collect excessive, unnecessary data from its users, for example, an application named "aarogya setu" requires name, phone number, age, gender, profession, and details of countries visited in the last days. also, geo-location tracing is unnecessary alongside bluetooth or other similar wireless technologies [ ] . • transparency: the entire process of data collection and usage should be transparent to preserve user privacy. application frameworks should have publicly available policies, clear and concise data flow and database, and open-source codes for transparency. additionally, users should have full control over their data usage. therefore, developers must follow the compliance and consent rules (gdpr, hipaa, ccpa, etc.) strictly [ ] , [ ] . apart from the privacy issues mentioned above, there are certainly other things to consider, as well. for example, the mobile application generated ids can be breached, decrypted, and resulted in exposing user information. additionally, applications controlled by the third party may pose a severe threat as they can misuse the collected user data. therefore, in terms of privacy, user control over the owner's data and transparency are notable factors. ) general concerns: apart from several data privacy issues, different design issues are prevailing in the contact tracing apps. many applications require a constant internet connection while it is entirely unnecessary. in some other implementations, the user can not turn off the background service of the apps, and these apps do not feature turn-off option. the apps continue to work at random times, such as while staying at home or sleeping. therefore, it causes battery drains too fast. tracing correctness depends on the distance and period of contact. several applications stores the ruerc of a nearby user device even there is a wall between two persons. moreover, rushing to develop such applications result in false positive suspected victim considerations and community transmission. communication between multiple platforms may appear troublesome and can lead to unexpected behavior. therefore, using google/apple api might be helpful to introduce bluetooth communication between two devices of different platforms (google android and apple ios). apart from these issues, insecure source code, weak data flow and process, and different bluetooth-based device attacks can cause security hazards. ) google/apple exposure notification api: google and apple announced their privacy-preserving exposure notification in april and released phase one in may [ ] . the api uses ble technology and applies different hashing algorithms to generate different keys on a specific time interval [ ] to prevent wireless tracking. infected cases upload only the daily generated keys of the past days, and other users download lists of these keys of infected cases of their corresponding region. all the key-generation and risk-level checks are performed in the user's mobile device, preserving user privacy. however, this might also be a concern about the resource consumption of that particular mobile device [ ] . moreover, as the user is only uploading their keys, the authority can not notify the contacts of the infected case immediately. some other essential functionalities, such as preventing community transmission and identifying the asymptomatic spreaders, can also be troublesome. the responsibility of implementing this technology is on the corresponding public health authorities. however, the authority is required to follow the guidelines of privacy, security, and data control rules as well as the development criteria such as file and data format of storing, uploading, and downloading keys [ ] , [ ] . this collaborative development of the exposure notification ensures that both the platform (android and ios) transmits and receive similar keys, and the risk level the app calculates is also similar. current contact tracing apps backed by the governments have numerous design and privacy issues due to the rush for community transmission tracing. regardless of addressing existing issues, the question remains if the governments can continue the economic activities alongside. therefore, our integrated mobile and fog computing-based framework can solve the issues by effectively tracing community transmission while organizations can run their economic activities. our proposed privacy-preserving e-government framework has four major components: user mobile device and two types of fog nodes (arc and sudun), and a central cloud application that integrates these nodes. mobile unit consists of ble, privacy dashboard, filter algorithm, file storage, and communication service. a user mobile device advertises it's own ruerc, scans for ruercs of other nearby devices, and save the filtered ruercs (see filtering algorithm ) in the file storage. fog-based iot-healthcare provides optimization of data communication, low power-consumption, and improves efficiency in terms of cost, network delay, and energy usage [ ] , [ ] . the ble in the arc and sudun advertises a specific predefined uerc, and the mobile unit does not save these uercs to file storage. when the user is in proximity to arc (hospital, shopping malls, office), ble of the arc receives all the ruercs around, and the fog component checks if there is any infected or suspected case. the cloud application responds with a positive or negative result without disclosing the victim's identity. if there is a positive case, the arc transmits another predefined uerc that signifies the risk level and alerts all the mobile units around, including the infected or suspected case, hence preserving the victim's privacy. the other fog node, sudun, is set either in the test centers or where the authority finds it necessary. when the mobile unit receives the predefined uerc from this fog node, it checks its privacy dashboard. if the privacy dashboard allows the mobile unit to send the file to sudun, the mobile unit establishes a connection with sudun and send the file. then, the fog component in sudun transfers the file to the corresponding cloud. here, figure presents the detailed workflow our proposed integrated mobile and fog computing framework. we introduce the following features of mobile application and fog nodes in our integrated framework: ) contact tracing: an user gets a hashed unique reference code upon registration from the system. every two hours, another unique reference code is generated in the application, which is being shared with nearby devices upon user consent. it ensures that the broadcast data cannot be used to trace an individual. the same hashed value will be generated in the cloud application, and it can be used to check the risk level of any individual without disclosing his or her identity with others. ) self-checking: users can check if he or she was nearby any infected victim in the last days using the mobile application. the application will upload the locally stored reference codes of the device it came in contact in the last days. the cloud application will check if any of the uploaded ruerc is listed as an infected victim or suspected victim. the application will notify the user accordingly. ) minimum mobile computation: our proposed framework ensures minimum computation by enabling user control to turn on or off scanning, and background services. apart from that, the mobile application features delay broadcast by avoiding unnecessary frequency and requires minimum internet connection as users only need internet while registering and uploading data for self-check. ) user data privacy: user personal data is not stored or shared in the system. we ensure minimum data collection and avoid user location tracking or digital surveillance. fog nodes do not identify any infected victim. ) fog node alerts: to minimize community transmission, we introduce automatic risk checker in public places such as shopping malls and office buildings. as there is a chance of revealing the user identity, we introduce time delay and minimum entry of people before broadcasting simple alert messages. these messages only request users to take precautions; do not reveal any ruerc or risk radius. our framework has four major components: a mobile application that broadcasts its ruerc, and stores received ruercs from nearby devices. the arc checks for infected cases and broadcast alerts in organization building. the sudun uploads data from the device of infected cases. finally, a central cloud application integrates all these mobile and fog nodes and manages collected data. users download the application from the google play store or govt server and install it in the mobile device. the application broadcasts its ruerc and receives ruerc of other devices around. it also detects the special predefined uerc broadcasted by automatic risk checker (arc) and alerts the user about an infected or suspected case around. ) automatic risk checker (iot/fog) (arc): this fog device is set inside the hospital, shopping mall, educational institutes, and in all government-supervised organizations where there is a possibility of community transmission. the fog can receive the ruerc of the mobile devices within the bluetooth range and interacts with cloud service to detect any infected or suspected victim. if any ruerc is found as an infected or suspected victim, it broadcasts a specific uerc throughout the place. the mobile application near the fog receives this spe- the notification is broadcast only when there are at least persons within the range of arc to ensure the privacy of the infected victim. on the contrary, the notification is not sent instantly when the arc identifies an infected or suspected victim. it notifies the users after a certain period, which ensures the identity of the infected or suspected victim is not disclosed to others. ) suspected user data uploader node (iot/fog) (sudun): these fog nodes are similar to arc and are set at the health centers or the covid- test centers. when a test center reports a positive test result, the infected person can voluntarily allow the application to enable uploading its ruerc list from the privacy dashboard and keep the mobile phone within the range of a sudun. a sudun automatically connects with the user's device and fetches the list of stored lists of hashed ruerc from the device. here, we enable monitoring each sudun using a cloud dashboard on a real-time basis. additionally, the infected victim can also upload the ruerc list by using the mobile application and an internet connection without the help of a sudun. the government provided secure data storage and computing server. receiving and storing data from sudun, mining to predict essential patterns or super spreaders, computing to provide required information to arc, and handling standard authentication to maintain security and privacy are its primary responsibilities. c. system workflow ) user registration: initially, users need to register in the cloud using mobile number, age group, and postal code. the mobile number field is mandatory; however, the age group and postal code are optional. the cloud application generates and sends a one time password(otp) to the usersubmitted mobile number through sms. then the user enters the otp in the mobile application, and the app sends the otp to cloud. the cloud application matches the user sent otp, with the generated one. if both matches, the cloud application generates a uerc for the user and send it back to the user installed mobile app. finally, the mobile app stores the uerc in encrypted format in the user device and then proceeds to scan ids broadcasted by other users. the user device itself broadcasts a rotational uerc (derived from the initial uerc) as well. figure presents the complete registration process. ) key generation: our framework generates rotational uerc (ruerc) with an interval of two hours. a person is considered a suspect case if he stays nearby to an infected person for an hour. as the ruerc changes every two hours, possibly the users are supposed to receive more than one ruerc if they stay nearby longer than an hour (e.g., in fig. : user registration process case they are neighbors). our mobile application ensures that even if a person comes in close at any minute, it will store individual key and compare against the timestamp. it increases the accuracy of finding contact without a rigorous need for computation. moreover, the ruerc enhances the privacy of the user because always broadcasting the same uerc may result in wireless tracking. while registering, the mobile unit receives a uerc from the cloud, and from this uerc, it generates ruercs using the aes algorithm on each two-hour interval for the next days. when two devices come close, they transmit and save these ruercs. figure presents the steps of ruerc generation. ) ruerc scan service: scan service allows the application to run in the background. it creates a new thread so that the user does not feel any obstacle while using other mobile apps. the user has the privilege to stop and then restart running the service at any time. scanning and advertising ruerc, creating files, and saving data are its primary responsibilities. scan service uses bluetooth low energy (ble) technology to share ruerc between two mobile devices. the ble consumes significantly less power to communicate, the cloud application(ca) generates rotational uerc using aes algorithm for every hours for next days which is mapped with initial uerc. ma sends the stored ruerc to ca. ca will find the uerc of the ruerc and check if they are listed as infected or suspected victims. it will notify the user accordingly. the mobile application also generate the same ruerc. when two mobile devices come nearby they share the ruerc. it is stored for next days in the device. steps control, and monitor iot devices. it uses "central" and "peripherals", which define a network called piconet [ ] . central scans for the advertisement and peripheral makes the advertisement. in our framework, the advertisement consists of the defined ruerc. the generic attribute profile (gatt) is designed to send and receive short pieces of data known as "attributes," and it is built on attribute protocol (att), which uses -bit unique id [ ] . in our framework, to maintain the minimal data collection policy and preserve device power, we are omitting the use of gatt and att. we are using emphperipheral to transmit the ruerc and the central to scan these ruercs. while scanning, we filter these ruercs with our suspect filtering algorithm and save accordingly. the framework uses a received signal strength indicator (rssi) to identify the distance between devices. rssi shows the strength of the received signal. it is calculated in db and depends on the power and chipset of the broadcasting device. it also depends on the transmitting medium. if there is any obstacle between the receiver and the sender, the signal strength will decrease, so the rssi. therefore, for a different manufacturer, the value of rssi in a specific medium shows different values. however, for a particular manufacturer, rssi shows the intensity of the signal strength. rssi does not provide the accurate distance but using path loss model [ ] , an approximate distance can be found, rssi = − · n · log (d) + c here, n is the path loss exponent that depends on the transmitting medium, d is the distance, and c is a constant. to maintain user privacy, the mobile application creates two files in the internal storage of the mobile device. even the user does not have permission to access these files. when the peripheral receives any signal, one file temporarily stores this signal information, and other stores the information of the filtered signal. when the device comes within the range of sudun, the signal information of the filtered file is sent to sudun or directly to the cloud to perform self-check. filtering: according to cdc, a distance of six feet between the case and contact is safe to maintain [ ] . a person is considered a contact if he/she is within six feet for at least fifteen minutes or is within the proximity for an hour [ ] . the scan service scans all the ruercs within the range of bluetooth. however, considering the rules mentioned above, continuous scanning is unnecessary as it causes battery and memory consumption. therefore, the scanning service scans for ms with three minutes interval. we develop our suspect filtering algorithm considering these factors so that the power and memory consumption is minimal, and identifying suspects is more accurate. there are two types of files in the mobile device. one contains all the signal information (ruerc, distance, and timestamp) it receives from nearby devices on each interval. if any received signal passes the filtering conditions, associate information (ruerc, distance, timestamp, and duration) is stored in another file. this file is called the final file and sent to the cloud once the user is found covid- positive. a specific ruerc is stored in the final file only once a day. here, the suspect filtering algorithm is shown in algorithm . the framework is divided into three data processing layers to preserve user privacy and restrict access to data in various processes. users can register to the system using a public communication process via mobile application. in this same access layer, the test centers can send the test results using restful api services through https protocol. the users can connect arc and sudun fog node through the ble protocol and send data to fog nodes. the computation and initial screening of user-uploaded data in fog nodes is done in a protected network layer. the apis which are consumed by user mobile applications, test centers, and fog nodes are also in this layer. the main data processor consisting of a set of application nodes resides in a protected network, and users cannot access the data processor directly. the processed data is stored in data storage, which is configured in a restricted private subnet. this subnet can only be accessed from the ancestor private subnet, not from any public subnet. the data layer is protected from sql injection as it is not directly connected through any user entries. figure displays the process flow diagram. in this section, we discuss the implementation overview of our framework with regards to amazon web service. additionally, we discuss the database design, analysis of our contact tracing graph, and our privacy-preserving solutions. we have implemented our framework using amazon web services (aws) [ ] , [ ] . as it is a generic framework, it can be applied in other iot/cloud platforms too. the fog nodes, arc and sudun, consist of aws greengrass components and lambda functions. they are connected to the aws iot core service using the mq telemetry transport (mqtt) protocol. the authenticity and security of the fog nodes are ensured by using iot device defender. iot device management service is used to monitor and audit the fog nodes. incoming data from the fog nodes are passed to simple queue service(sqs). the queue data is processed via lambda functions and moved into the application containers, which is managed via the kubernetes cluster within an auto-scaling group for dynamic scaling of the master nodes and worker nodes. after the data is processed in the application nodes, it is sent to the redis cache tier and eventually to the amazon relational database service (rds). for the administration of the cloud application, an ssh connection is provided via elastic load balancing to a bastion server. the users need to be connected to the cloud application only on the registration process via https connection to a load balancer. the test centers send the test results using restful api over https. the scheduled tasks, for example, generating the user's rotational uerc, is done in an elastic container service, which uses the cloudwatch event rule at a specific time of the day. the application generates alarms for any irregular activities like device connection error or access failure via cloud-watch alarms to the proper authority. users are notified from arc, sudun, and cloud application via simple notification service. for further communication processes to authorized organizations and administrators, the simple email service is used. for restricting unauthorized access to the applications and data layers, private subnets are used. to ensure efficient data availability in the cloud, a database instance is stored in a separate availability zone. here, figure illustrates the implementation overview. the cloud database includes several tables to trace infected and suspected cases, to alert users, to prevent community transmission in organizations through arc, and collect test results from sudun. the user table stores user registration information (mobile number, age group, postcode, and timestamp) and the initial uerc. as we are generating new uercs associated with the primary user uerc, we need mapping in between those unique reference codes. we store test results (result id, test organization id, timestamp, test result, and user mobile number) in a separate table and extract the infected users (affected uerc, test result id, and affected id). now, we map between the registered user table and affected user table to find out the suspected users and store the suspected uerc, duration, timestamp, and distance radius alongside generating a new suspect id. we also need an organization database where we store organization information (name, email, geo-location, address.) and associated permissions for arc and sudun. figure presents the relations between all these tables. to identify and trace covid- community transmission, we introduce and utilize a graph database called "neo j" [ ] to determine the contact graph. figure presents a portion of the contact tracing graph with synthetic data of contact tracing. the red, blue, and green circles indicate the test centers, users, and test results. the lines present directed edges of the graph between infected and suspected victims. the tracing graph can be used to identify individuals who had come in close contact with infected victims almost in real-time. we can determine a potential super spreader as well, using the contact tracing graph. additionally, suspected victims identified by the graph can be informed to take cautionary actions to prevent community transmission. as we discussed five major data privacy issues earlier in section ii, here, we discuss solutions to these issues in terms of implementing our framework. users have full control over the usage of this application and the stored data in the application storage. even it is an e-government framework, no one can force users to use the mobile app, and the participation is voluntary. while using the mobile application, users have a privacy dashboard (figure a) where he can allow if the app wants to store collected ruercs in the local storage, use device bluetooth to share user's ruerc, and if the user wants to share the collected ruercs with the government. ) minimal data collection: initially, the system collects the user's age group, postal code, and mobile number for registration and, in return, provides a unique id (uerc) to be stored in a user's mobile device. then the mobile application collects ruerc, timestamp, duration, and distance measure fig. : the cloud database consisting of mandatory tables (based on rssi) from other mobile devices that came in contact with the user's device. age group and postal code are optional fields. these fields are used to define clusters and identifying super spreaders. the phone number is used to alert contact to take precautions. no personal information of the user is collected during or after the registration procedure. figure b presents the mobile application user interface with mandatory and non-mandatory fields. ) data destruction: the mobile application deletes collected ruercs no later than days calculated from the latest timestamp. in the cloud application, uerc and associated data from the suspected table is deleted in days of entry timestamp. additionally, a user has full control over his data and can delete it manually from the application storage instantly. a user can also request for account deactivation and deletion of data from the cloud database. however, it may take up to two weeks to synchronize data in all replica databases. transparency is vital in data privacy, and governments should make the source codes open and publish the user data flow in the whole framework. transparency motivates users to use an application and results in better management and increased public engagement. consent and compliance can play a crucial role in presenting transparent data processes and improving decision making. the collected data should have limited use cases, and our framework opens the opportunity to determine a super spreader, and clustered view of positive cases. additionally, the infected and suspected lists can help calculate the risk factors in terms of community transmission. however, while employing, further data usage should be minimal, and users need to have a clear idea of why and how data is being processed. tang discussed a few existing privacy-preserving framework solutions that attempted to solve several user data privacy issues [ ] . author provided some observations and privacy solutions on singapore's "tracetogether", reichert et al.s "mpc solution" [ ] , altuwaiyan et al.s "matching solution" [ ] , and vaudenay's "dp t" solution [ ] . singapore's tracetogether: affected users are compelled to share with the ministry of health the locally saved information. in our framework, this is voluntary to share any information with the system. additionally, there is a risk of decomposition of the app and collecting the geo-location data from the app. as we are not storing any geo-location data anywhere, this does not possess any risk of transpassing. this solution requires the users smart device to be used to acquire and save geolocation data, and this location trace is shared with the health authority (ha) [ ] . this may raise a privacy issue of the users as it does not mention if they can preserve their privacy by controlling what information they want to share with the authority and what information they do not want to share. we provide a clear and concise privacy dashboard to the users so they can choose the information they want to share and alter their choices. the solution does not consider the scalability of computation of extensive data set collected from the devices, and ha requires to arrange garbled circuits (a cryptographic protocol that encrypts computation) for all infected users. we take precautionary steps in fog nodes to filter the collected data. additionally, the central application uses an auto-scaling technique and a queue service to facilitate the upcoming data. this solution utilizes a privacy-preserving matching protocol between users along with proper distance measuring procedure [ ] . we are using similar metrics to calculate the distance between mobile devices. on the other hand, infected users' exact location is shared with the central server, which indicates severe privacy violation. we do not use individual users locations and thus preserve users location privacy. the dp- t solution: the dp- t solution provides a privacy-aware framework while considering three design considerations: low-cost design, unlinkable design, and hybrid design. the solution takes advantage of the content delivery network and provides a cost estimation per patient. authors discussed privacy concerns such as the social graph (social relationships between users), interaction graph (physical interaction nearby), location traceability (tracing individuals), at-risk individuals (suspected cases), positive status (infected cases), and exposed location (partial identification of places a positive case visited). they also discussed the addressable privacy concerns in terms of the three design considerations. covidsafe: the app collects user name, phone number, age-range, and postcode followed by the consent of the user and sends to the server. the app stores encrypted user id, time of contact, and bluetooth signal strength and keeps this data for days. the case can upload these data voluntarily [ ] . the government has amended the privacy act to prevent the misuse of these data [ ] . the user can uninstall the app to delete these data and to delete all data from the server, the user needs to wait until the pandemic is over whereas, our framework allows users to delete all data within days. table i presents the difference between existing mobile application frameworks and our integrated mobile-fog computing framework (ppmf). here, we categorize different features in terms of privacy-preserving approaches (voluntary, data usage limitation, data destruction, minimal data collection, and transparency), fog-based integrated solutions (risk check, infected/suspected data upload), and general design approaches (temporary ble ids, no geo-location trace, and minimal internet requirements). we put our framework (ppmf) at the end of the list and find it ticks all these features. in this work, we have presented a mobile and fog computing-based integrated framework that can trace and prevent community transmissions alongside maintaining user data privacy. in ppmf, we consider minimal data collection and provide a temporary ruerc in encrypted form for storing in user devices. the arc is responsible for tracing positive cases in public places and sending alerts to nearby people without revealing the user's identity. the sudun uploads data to the central database about infected and suspected cases that can be processed to identify a super spreader and visualize the clusters of cases based on postal codes and age groups. minimal and undetectable data collection, user control, and system transparency are essential factors to ensure user data privacy. privacy dashboard-based on compliance and user consent makes it convenient and encourages citizens to use such a user-friendly e-government framework. therefore, using the structure of our proposed ppmf framework, governments can continue their economic activities while tracing and minimizing the mass-level community transmission. in the future, we plan to develop a super spreader detection model and clustering methodology of infected cases, based on our framework. home -johns hopkins coronavirus resource center use of apps in the covid- response and the loss of privacy protection pact: privacy sensitive protocols and mechanisms for mobile contact tracing contact tracing mobile apps for covid- : privacy considerations and related trade-offs apps gone rogue: maintaining personal privacy in an epidemic mobile edge computing, fog et al.: a survey and analysis of security threats and challenges security and privacy issues of fog computing: a survey a survey on cloud computing security: issues, threats, and solutions feasibility of controlling covid- outbreaks by isolation of cases and contacts big data for health investigation of three clusters of covid- in singapore: implications for surveillance and response measures social distancing, quarantine, and isolation covid- contact tracing, coursera, johns hopkins university, usa assessing disease exposure risk with location data; a proposal for cryptographic preservation of privacy covid- contact tracing and data protection can go together a flood of coronavirus apps are tracking us. now its time to keep track of them covid- contact tracing and privacy: studying opinion and preferences the value and ethics of using technology to contain the covid- epidemic aarogya setu faqs gdpr compliance: implementation use cases for user data privacy in news media industry exposure notification -faq v exposure notification -bluetooth specification pages energy consumption of hash functions exposure notifications service additional terms cloud-fog interoperability in iot-enabled healthcare solutions iot-based remote pain monitoring system: from device to cloud platform overview and evaluation of bluetooth low energy: an emerging low-power wireless technology distance measurement and error estimation scheme for rssi based localization in wireless sensor networks simpledb and sqs web services enables you to reach business goals faster amazon web services (aws) -cloud computing services neo j graph platform the leader in graph databases privacy-preserving contact tracing: current solutions and open questions privacy-preserving contact tracing of covid- patients epic: efficient privacypreserving contact tracing for infection detection cryptology eprint archive privacy policy for covidsafe app privacy amendment (public health contact information) act this research is partly supported through the australian research council discovery project: dp , 're-engineering enterprise systems for microservices in the cloud.' key: cord- -wes my e authors: masud, sarah; dutta, subhabrata; makkar, sakshi; jain, chhavi; goyal, vikram; das, amitava; chakraborty, tanmoy title: hate is the new infodemic: a topic-aware modeling of hate speech diffusion on twitter date: - - journal: nan doi: nan sha: doc_id: cord_uid: wes my e online hate speech, particularly over microblogging platforms like twitter, has emerged as arguably the most severe issue of the past decade. several countries have reported a steep rise in hate crimes infuriated by malicious hate campaigns. while the detection of hate speech is one of the emerging research areas, the generation and spread of topic-dependent hate in the information network remain under-explored. in this work, we focus on exploring user behaviour, which triggers the genesis of hate speech on twitter and how it diffuses via retweets. we crawl a large-scale dataset of tweets, retweets, user activity history, and follower networks, comprising over million tweets from more than $ $ million unique users. we also collect over k contemporary news articles published online. we characterize different signals of information that govern these dynamics. our analyses differentiate the diffusion dynamics in the presence of hate from usual information diffusion. this motivates us to formulate the modelling problem in a topic-aware setting with real-world knowledge. for predicting the initiation of hate speech for any given hashtag, we propose multiple feature-rich models, with the best performing one achieving a macro f score of . . meanwhile, to predict the retweet dynamics on twitter, we propose retina, a novel neural architecture that incorporates exogenous influence using scaled dot-product attention. retina achieves a macro f -score of . , outperforming multiple state-of-the-art models. our analysis reveals the superlative power of retina to predict the retweet dynamics of hateful content compared to the existing diffusion models. for the past half-a-decade, in synergy with the sociopolitical and cultural rupture worldwide, online hate speech has manifested as one of the most challenging issues of this century transcending beyond the cyberspace. many hate crimes against minority and backward communities have been directly linked with hateful campaigns circulated over facebook, twitter, gab, and many other online platforms [ ] , [ ] . online social media has provided an unforeseen speed of information spread, aided by the fact that the power of content generation is handed to every user of these platforms. extremists have exploited this phenomenon to disseminate hate campaigns to a degree where manual monitoring is too costly, if not impossible. thankfully, the research community has been observing a spike of works related to online hate speech, with a vast majority of them focusing on the problem of automatic detection of hate from online text [ ] . however, as ross et al. [ ] pointed it out, even manual identification of hate speech comes with ambiguity due to the differences in the definition of hate. also, an important signal of hate speech is the presence of specific words/phrases, which vary significantly across topics/domains. tracking such a diverse socio-linguistic phenomenon in realtime is impossible for automated, large-scale platforms. an alternative approach can be to track potential groups of users who have a history of spreading hate. as matthew et al. [ ] suggested, such users are often a very small fraction of the total users but generate a sizeable portion of the content. moreover, the severity of hate speech lies in the degree of its spread, and an early prediction of the diffusion dynamics may help combat online hate speech to a new extent altogether. however, a tiny fraction of the existing literature seeks to explore the problem quantitatively. matthew et al. [ ] put up an insightful foundation for this problem by analyzing the dynamics of hate diffusion in gab . however, they do not tackle the problem of modeling the diffusion and restrict themselves to identifying different characteristics of hate speech in gab. hate speech on twitter: twitter, as one of the largest micro-blogging platforms with a worldwide user base, has a long history of accommodating hate speech, cyberbullying, and toxic behavior. recently, it has come hard at such contents multiple times , and a certain fraction of hateful tweets are often removed upon identification. however, a large majority of such tweets still circumvent twitter's filtering. in this work, we choose to focus on the dynamics of hate speech on twitter mainly due to two reasons: (i) the wide-spread usage of twitter compared to other platforms provides scope to grasp the hate diffusion dynamics in a more realistic manifestation, and (ii) understanding how hate speech emerges and spreads even in the presence of some top-down checking measures, compared to unmoderated platforms like gab. diffusion patterns of hate vs. non-hate on twitter: hate speech is often characterized by the formation of echochambers, i.e., only a small group of people engaging with such contents repeatedly. in figure , we compare the temporal diffusion dynamics of hateful vs. non-hate tweets (see sections vi-a and vi-b for the details of our dataset and hate detection methods, respectively). following the standard information diffusion terminology, the set of susceptible nodes at any time instance of the spread is defined by all such nodes which have been exposed to the information (followers of those who have posted/retweeted the tweet) up to that instant but did not participate in spreading (did not retweet/like/comment). while hateful tweets are retweeted in a significantly higher magnitude compared to non-hateful ones (see figure (a)), they tend to create lesser number of susceptible users over time (see figure (b)). this is directly linked to two major phenomena: primarily, one can relate this to the formation of hate echo-chambers -hateful contents are distributed among a well-connected set of users. secondarily, as we define susceptibility in terms of follower relations, hateful contents, therefore, might have been diffusing among connections beyond the follow network -through paid promotion, etc. also one can observe the differences in early growth for the two types of information; while hateful tweets acquire most of their retweets and susceptible nodes in a very short time and stall, later on, non-hateful ones tend to maintain the spread, though at a lower rate, for a longer time. this characteristic can again be linked to organized spreaders of hate who tend to disseminate hate as early as possible. topic-dependence of twitter hate: hateful contents show strong topic-affinity: topics related to politics and social issues, for example, incur much more hateful content compared to sports or science. hashtags in twitter provide an overall mapping for tweets to topics of discussion. as shown in figure , the degree of hateful content varies significantly for different hashtags. even when different hashtags share a common theme (such as topic of discussion #jamiaunderattack, #jamiaviolence and #jamiacctv), they may still incur a different degree of hate. previous studies [ ] tend to denote users as hate-preachers irrespective of the topic of discussion. however, as evident in figure , the degree of hatefulness expressed by a user is dependent on the topic as well. for example, while some users resort to hate speech concerning covid- and china, others focus on topics around the protests against the citizenship amendment act in india. exogenous driving forces: with the increasing entangle- the color of a cell corresponds to a user, and a hashtag signifies the ratio of hateful to non-hate tweets posted by that user using that specific hashtag. ment of virtual and real social processes, it is only natural that events happening outside the social media platforms tend to shape the platform's discourse. though a small number of existing studies attempt to inquire into such inter-dependencies [ ] , [ ] , the findings are substantially motivating in problems related to modeling information diffusion and user engagement in twitter and other platforms. in the case of hate speech, exogenous signals offer even more crucial attributes to look into, which is global context. for both detecting and predicting the spread of hate speech over short tweets, the knowledge of context is likely to play a decisive role present work: based on the findings of the existing literature and the analysis we presented above, here we attempt to model the dynamics of hate speech spread on twitter. we separate the process of spread as the hate generation (asking for who will start a hate campaign) and retweet diffusion of hate (who will spread an already started hate campaign via retweeting). to the best of our knowledge, this is the very first attempt to delve into the predictive modeling of online hate speech. our contributions can be summarized as follows: ) we formalize the dynamics of hate generation and retweet spread on twitter subsuming, the activity history of each user and signals propagated by the localized structural properties of the information network of twit-ter induced by follower connections as well as global endogenous and exogenous signals (events happening inside and outside of twitter) (see section iii). ) we present a large dataset of tweets, retweets, user activity history, and the information network of twitter covering versatile hashtags, which made to trend very recently. we manually annotate a significant subset of the data for hate speech. we also provide a corpus of contemporary news articles published online (see section vi-a for more details). ) we unsheathe rich set of features manifesting the signals mentioned above to design multiple prediction frameworks which forecast, given a user and a contemporary hashtag, whether the user will write a hateful post or not (section iv). we provide an in-depth feature ablation and ensemble methods to analyze our proposed models' predictive capability, with the best performing one resulting in a macro f -score of . . ) we propose retina (retweeter identifier network with exogenous attention), a neural architecture to predict potential retweeters given a tweet (section v-b). retina encompasses an attention mechanism which dictates the prediction of retweeters based on a stream of contemporary news articles published online. features representing hateful behavior encoded within the given tweet as well as the activity history of the users further help retina to achieve a macro f -score of . , significantly outperforming several state-of-the-art retweet prediction models. we have made public our datasets and code along with the necessary instructions and parameters, available at https://github.com/lcs -iiitd/retina. hate speech detection. in recent years, the research community has been keenly interested in better understanding, detection, and combating hate speech on online media. starting with the basic feature-engineered logistic regression models [ ] , [ ] to the latest ones employing neural architectures [ ] , a variety of automatic online hate speech detection models have been proposed across languages [ ] . to determine the hateful text, most of these models utilize a static-lexicon based approach and consider each post/comment in isolation. with lack of context (both in the form of individual's prior indulgence in the offense and the current world view), the models trained on previous trends perform poorly on new datasets. while linguistic and contextual features are essential factors of a hateful message, the destructive power of hate speech lies in its ability to spread across the network. however, only recently have researchers started using network-level information for hate speech detection [ ] , [ ] . rathpise and adji [ ] proposed methods to handle class imbalance in hate speech classification. a recent work showed how the anti-social behavior on social media during covid- led to the spread of hate speech. awal et al. [ ] coined the term, 'disability hate speech' and showed its social, cultural and political contexts. ziems et al. [ ] explained how covid- tweets increased racism, hate, and xenophobia in social media. while our work does not involve building a new hate speech detection model, yet hate detection underpins any work on hate diffusion in the first place. inspired by existing research, we also incorporate hate lexicons as a feature for the diffusion model. the lexicon is curated from multiple sources and manually pruned to suit the indian context [ ] . meanwhile, to overcome the problem of context, we utilize the timeline of a user to determine her propensity towards hate speech. information diffusion and microscopic prediction. predicting the spread of information on online platforms is crucial in understanding the network dynamics with applications in marketing campaigns, rumor spreading/stalling, route optimization, etc. the latest in the family of diffusion being the chassis [ ] model. on the other end of the spectrum, the sir model [ ] effectively captures the presence of r (recovered) nodes in the system, which are no longer active due to information fatigue . even though limited in scope, the sir model serves as an essential baseline for all diffusion models. among other techniques, a host of studies employ social media data for both macroscopic (size and popularity) and microscopic (next user(s) in the information cascade) prediction. while highly popular, both deepcas [ ] and deephawkes [ ] focus only on the size of the overall cascade. similarly, khosla et al. [ ] utilized social cues to determine the popularity of an image on flickr. while independent cascade (ic) based embedding models [ ] , [ ] led the initial work in supervised learning based microscopic cascade prediction; they failed to capture the cascade's temporal history (either directly or indirectly). meanwhile, yang et al. [ ] presented a neural diffusion model for microscopic prediction, which employs recurrent neural architecture to capture the history of the cascade. these models focus on predicting the next user in the cascade from a host of potential candidates. in this regard, topolstm [ ] considers only the previously seen nodes in any cascade as the next candidate without using timestamps as a feature. this approximation works well under limited availability of network information and the absence of cascade metadata. meanwhile, forest [ ] considers all the users in the global graph (irrespective of one-hop) as potential users, employing a time-window based approach. work by wang et al. [ ] lies midway of topolstm and forest, in that it does not consider any external global graph as input, but employs a temporal, two-level attention mechanism to predict the next node in the cascade. zhou et al. [ ] compiled a detailed outline of recent advances in cascade prediction. compared to the models discussed above for microscopic cascade prediction, which aim to answer who will be the next participant in the cascade, our work aims to determine whether a follower of a user will retweet (participate in the probability of ui retweeting (static vs. j th interval) x t , x n feature tensors for tweet and news x t,n output from exogenous attention cascade) or not. this converts our use case into a binary classification problem, and adds negative sampling (in the form on inactive nodes), taking the proposed model closer to realworld scenario consisting of active and passive social media users. the spread of hate and exploratory analysis by mathew et al. [ ] revealed exciting characteristics of the breadth and depth of hate vs. non-hate diffusion. however, their methodology separates the non-haters from haters and studies the diffusion of two cascades independently. real-world interactions are more convoluted with the same communication thread containing hateful, counter-hateful, and non-hateful comments. thus, independent diffusion studies, while adequate at the exploratory analysis of hate, cannot be directly extrapolated for predictive analysis of hate diffusion. the need is a model that captures the hate signals at the user and/or group level. by taking into account the user's timeline and his/her network traits, we aim to capture more holistic hate markers. exogenous influence. as early as , myers et al. [ ] exposed that external stimuli drive one-third of the information diffusion on twitter. later, hu et al. [ ] proposed a model for predicting user engagement on twitter that is factored by user engagement in real-world events. from employing world news data for enhancing language models [ ] to boosting the impact of online advertisement campaigns [ ] , exogenous influence has been successfully applied in a wide variety of tasks. concerning social media discourse, both de et al. [ ] in opinion mining and dutta et al. [ ] in chatter prediction corroborated the superiority of models that consider exogenous signals. since our data on twitter was collected based on trending indian hashtags, it becomes crucial to model exogenous signals, some of which may have triggered a trend in the first place. while a one-to-one mapping of news keywords to trending keywords is challenging to obtain, we collate the most recent (time-window) news w.r.t to a source tweet as our ground-truth. to our knowledge, this is the first retweet prediction model to consider external influence. an information network of twitter can be defined as a directed graph g = {u, e}, where every user corresponds to a unique node u i ∈ u, and there exists an ordered pair (u i , u j ) ∈ e if and only if the user corresponding to u j follows user u i . (table i summarizes important notations and denotations. ) typically, the visible information network of twitter does not associate the follow relation with any further attributes, therefore any two edges in e are indistinguishable from each other. we associate unit weight to every e ∈ e. every user in the network acts as an agent of content generation (tweeting) and diffusion (retweeting). for every user u i at time t , we associate an activity history the information received by user u i has three different sources: (a) peer signals (s p i ): the information network g governs the flow of information from node to node such that any tweet posted by u i is visible to every user u j if (u i , u j ) ∈ e; (b) non-peer endogenous signals (s en ): trending hashtags, promoted contents, etc. that show up on the user's feed even in the absence of peer connection; (c) exogenous signals (s ex ): apart from the twitter feed, every user interacts with the external world-events directly (as a participant) or indirectly (via news, blogs, etc.). hate generation. the problem of modeling hate generation can be formulated as assigning a probability with each user that signifies their likelihood to post a hateful tweet. with our hypothesis of hateful behavior being a topic-dependent phenomenon, we formalize the modeling problem as learning the parametric function, where t is a given topic, t is the instance up to which we obtain the observable history of u i , d is the dimensionality of the input feature space, and θ is the set of learnable parameters. though ideally p (u i |t ) should be dependent on s p i as well, the complete follower network for twitter remains mostly unavailable due to account settings, privacy constraints, inefficient crawling, etc. hate diffusion. as already stated, we characterize diffusion as the dynamic process of retweeting in our context. given a tweet τ (t ) posted by some user u i , we formulate the problem as predicting the potential retweeters within the interval [t , t + ∆t]. assuming the probability density of a user u j retweeting τ at time t to be p(t), then retweet prediction problem translates to learning the parametric function eq. is the general form of a parametric equation describing retweet prediction. in our setting, the signal components s p j , h j,t , and the features representing the tweet τ incorporates the knowledge of hatefulness. henceforth, we call τ the root tweet and u i the root user. it is to be noted that, the features representing the peer, non-peer endogenous, and exogenous signals in eq. and may differ due to the difference in problem setting. beyond organic diffusion. the task of identifying potential retweeters of a post on twitter is not straightforward. in retrospect, the event of a user retweeting a tweet implies that the user must have been an audience of the tweet at some point of time (similar to 'susceptible' nodes of contagion spread in the sir/sis models [ ] , [ ] ). for any user, if at least one of his/her followees engages with the retweet cascade, then the subject user becomes susceptible. that is, in an organic diffusion, between any two users u i , u j there exists a finite path u i , u i+ . . . , u j in g such that each user (except u i ) in this path is a retweeter of the tweet by u i . however, due to account privacy etc., one or more nodes within this path may not be visible. moreover, contents promoted by twitter, trending topics, content searched by users independently may diffuse alongside their organic diffusion path. searching for such retweeters is impossible without explicit knowledge of these phenomena. hence, we primarily restrict our retweet prediction to the organic diffusion, though we experiment with retweeters not in the visibly organic diffusion cascade to see how our models handle such cases. to realize eq. , we signify topics as individual hashtags. we rely purely on manually engineered features for this task so that rigorous ablation study and analysis produce explainable knowledge regarding this novel problem. the extracted features instantiate different input components of f in eq. . we formulate this task in a static manner, i.e., assuming that we are predicting at an instance t , we want to predict the probability of the user posting a hateful tweet within [t , ∞]. while training and evaluating, we set t to be right before the actual tweeting time of the user. the activity history of user u i , signified by h i,t is substantiated by the following features: • we use unigram and bigram features weighted by tf-idf values from most recent tweets posted by u i to capture its recent topical interest. to reduce the dimensionality of the feature space, we keep the top features sorted by their idf values. • to capture the history of hate generation by u i , we compute two different features her most recent tweets: (i) ratio of hateful vs. non-hate tweets and (ii) a hate lexicon vector hl = {h i |h i ∈ ii + and i = , . . . , |h|}, where h is a dictionary of hate words, and h i is the frequency of the i th lexicon from h among the tweet history. • users who receive more attention from fellow users for hate propagation are more likely to generate hate. therefore, we take the ratio of retweets of previous hateful tweets to nonhateful ones by u i . we also take the ratio of total number of retweets on hateful and non-hateful tweets of u i . • follower count and date of account creation of u i . • number of topics (hashtags) u i has tweeted on up to t. we compute doc vec [ ] representations of the tweets, along with the hashtags present in them as individual tokens. we then compute the average cosine similarity between the user's recent tweets and the word vector representation of the hashtag, this serves as the topical relatedness of the user towards the given hashtag. to incorporate the information of trending topics over twitter, we supply the model with a binary vector representing the top trending hashtags for the day the tweet is posted. we compute the average tf-idf vector for the most recent news headlines from our corpus posted before the time of the tweet. again we select the top features. using the above features, we implement six different classification models(and their variants). details of the models are provided in section vi-c. v. retweet prediction while realizing eq. for retweeter prediction, we formulate the task in two different settings: the static retweeter prediction task, where t is fixed, and ∆t is ∞ (i.e., all the retweeters irrespective of their retweet time) and the dynamic retweeter prediction task where we predict on successive time intervals. for these tasks, we rely on features both designed manually as well as extracted using unsupervised/self-supervised manner. for the task of retweet prediction, we extract features representing the root tweet itself, as well as the signals of eq. corresponding to each user u i (for which we predict the possibility of retweeting). henceforth, we indicate the root user by u . here, we incorporate s p i using two different features: shortest path length from u to u i in g, and number of times u i has retweeted tweets by u . all the features representing h i,t and s en remain same as described in section iv. we incorporate two sets of features representing the root tweet τ : the hate lexicon vector similar to section iv-a and top . we varied the size of features from to , and the best combination was found to be . for the retweet prediction task, we incorporate the exogenous signal in two different methods. to implement the attention mechanism of retina, we use a doc vec representations of the news articles as well as the root tweet. for the rest of the models, we use the same feature set as section iv-d. guided by eq. , retina exploits the features described in section v-a for both static and dynamic prediction of retweeters. exogenous attention. to incorporate external information as an assisting signal to model diffusion, we use a variation of scaled dot product attention [ ] in retina (see figure ). given the feature representation of the tweet x t and news static prediction of retweeters: to predict whether u j will retweet, the input feature x uj is normalized and passed through a feed-forward layer, concatenated with x t,n , and another feed-forward layer is applied to predict the retweeting probability p uj . (c) dynamic retweet prediction: in this case, retina predicts the user retweet probability for consecutive time intervals, and instead of the last feed-forward layer used in the static prediction, we use a gru layer. feature sequence x n = {x n , x n , . . . , x n k }, we compute three tensors q t , k n , and v n , respectively as follows: where w q , w k , and w v are learnable parameter kernels (we denote them to belong to query, key and value dense layers, respectively in figure ). the operation (·) | (− , ) (·) signifies tensor contraction according to einstein summation convention along the specified axis. in eq. , (− , ) signifies last and first axis of the first and second tensor, respectively. therefore, x each of w q , w k , and w v is a two-dimensional tensor with hdim columns (last axis). next, we compute the attention weight tensor a between the tweet and news sequence as where sof tmax(x[. . . , i, j]) = e x[...,i,j] j e x[...,i,j] . further, to avoid saturation of the softmax activation, we scale each element of a by hdim − . [ ] . the attention weight is then used to produce the final encoder feature representation x t,n by computing the weighted average of v n as follows: retina is expected to aggregate the exogenous signal exposed by the sequence of news inputs according to the feature representation of the tweet into x t,n , using the operations mentioned in eqs. - via tuning the parameter kernels. final prediction. with s ex being represented by the output of the attention framework, we incorporate the features discussed in section v-a in retina to subsume rest of the signals (see eq. ). for the two separate modes of retweeter prediction (i.e., static and dynamic), we implement two different variations of retina. for the static prediction of retweeters, retina predicts the probability of each of the users u , u , . . . , u n to retweet the given tweet with no temporal ordering (see figure (b)). the feature vector x ui corresponding to user u i is first normalized and mapped to an intermediate representation using a feedforward layer. it is then concatenated with the output of the exogenous attention component, x t,n , and finally, another feed-forward layer with sigmoid nonlinearity is applied to compute the probability p ui . as opposed to the static case, in the dynamic setting retina predicts the probability of every user u i to retweet within a time interval t + ∆t i , t + ∆t i+ , with t being the time of the tweet published and ∆t = . to capture the temporal dependency between predictions in successive intervals, we replace the last feed-forward layer with a gated recurrent unit (gru), as shown in figure (c). we experimented with other recurrent architectures as well; performance degraded with simple rnn and no gain with lstm. cost/loss function. in both the settings, the task translates to a binary classification problem of deciding whether a given user will retweet or not. therefore, we use standard binary cross-entropy loss l to train retina: where t is the ground-truth, p is predicted probability (p ui in static and p ui j in dynamic settings), and w is a the weight given to the positive samples to deal with class imbalance. we initially started collected data based on topics which led to a tweet corpus spanning across multiple years. to narrow down our time frame and ease the mapping of tweets to news, we restricted our time span from - - to - - and made use of trending hashtags. using twitter's official api , we tracked and crawled for trending hashtags each day within this duration. overall, we obtained , tweets from , users. we also crawled the retweeters for each tweet along with the timestamps. table ii describes the hashtag-wise detailed statistics of the data. to build the information network, we collected the followers of each user up to a depth of , resulting in a total of , , unique users in our dataset. we also collect the activity history of the users, resulting in a total of , , tweets in our dataset. one should note that the lack of a wholesome dataset (containing textual, temporal, network signals all in one) is the primary reason why we decided to collect our own dataset in the first place. we also, crawled the online news articles published within this span using the news-please crawler [ ] . we managed to collect a total of , news articles for this period. after filtering for language, title and date, we were left with , processed items. there headlines were used as the source of the exogenous signal. we employ three professional annotators who have experience in analyzing online hate speech to annotate the tweets manually. all of these annotators belong to an age group of - years and are active on twitter. as the contextual knowledge of real-world events plays a crucial role in identifying hate speech, we ensure that the annotators are well-aware of the events related to the hashtags and topics. annotators were asked to follow twitter's policy as guideline for identifying hateful behavior . we annotated a total of , tweets with an inter-annotator agreement of . krippendorf's α. the low value of inter-annotator's agreement is at par with most hate speech annotation till date, pointing out the hardness of the task even for human subjects. this further strengthens the need for contextual knowledge as well as exploiting beyondthe-text dynamics. we select the final tags based on majority voting. based on this gold-standard annotated data, we train three different hate speech classifiers based on the designs given by davidson et al. [ ] (dubbed as davidson model), waseem and hovy [ ] , and pinkesh et al. [ ] . with an auc score . and macro-f . , the davidson model emerges as the best performing one. when the existing pre-trained davidson model was tested on our annotated dataset, it achieved . auc and . macro-f . this highlights both the limitations of existing hate detection models to capture newer context, as well as the importance of manual annotations and fine-tuning. we use the fine-tuned model to annotate the rest of the tweets in our dataset (% of hateful tweets for each hashtag is reported in table ii) . we use the machine-annotated tags for the features and training labels in our proposed models only, while the hate generation models are tested solely on gold-standard data. along with the manual annotation and trained hate detection model, we use a dictionary of hate lexicons proposed in [ ] . it contain a total of words/phrases signaling a possible existence of hatefulness in a tweet. example of slur terms used in the lexicon include words such as harami (bastard), jhalla (faggot), haathi (elephant/fat). using the above terms is derogatory and a direct offense. in addition, the lexicon has some colloquial terms such as mulla (muslim), bakar (gossip), aktakvadi (terrorist), jamai (son-in-law) which may carry a hateful sentiment depending on the context in which they are used. to experiment on our hate generation prediction task, we use a total of , tweets(which have atleast news mapping to it from the time of its posting) coming from , users to construct the ground-truth. with an : train-test split, there are hateful tweets among , in the training data, whereas out of , in the testing data. to deal with the severe class imbalance of the dataset, we use both upsampling of positive samples and downsampling of negative samples. with all the features discussed in section iv, the full size of the feature vector is , . we experimented with all our proposed models with this full set of features and dimensionality reduction techniques applied to it. we use principal component analysis (pca) with the number of components set to . also, we conduct experiments selecting k-best features (k = ) using mutual information. we implement a total of six different classifiers using support vector machine (with linear and rbf kernel), logistic regression, decision tree, adaboost, and xgboost [ ] . parameter settings for each of these are reported in table iii . all of the models, pca, and feature section are implemented using scikit-learn . the activity of retweeting, too, shows a skewed pattern similar to hate speech generation. while the maximum number retweets for a single tweet is in our dataset, the average remains to be . . we use only those tweets which have more than one retweet and atleast news mapping to it from the time of its posting. with an : train-test split, this results in a total of , and samples for training and testing. for all the doc vec generated feature vectors related to tweets and news headlines, we set the dimensionality to and , respectively. for retina, we set the parameter hdim and all the intermediate hidden sizes for the rest of the feedforward (except the last one generating logits) and recurrent layers to (see section v-b). hyperparameter tuning of retina. for both the settings (i.e, static and dynamic prediction of retweeters), we used mini-batch training of retina, with both adam and sgd optimizers. we varied the batch size within , and , with the best results for a batch size of for the static mode and for the dynamic mode. we also varied the learning rates within a range − to − , and chose the best one with learning rate − using the sgd optimizer for the dynamic model. the static counterpart produced the best results with adam optimizer [ ] using default parameters. to deal with the class imbalance, we set the parameter w in eq. as w = λ(log c − log c + ), where c and c + are the counts for total and positive samples, respectively in the training dataset, and λ is a balancing constant which we vary from to . with . steps. we found the best configurations with λ = . and λ = . for the static and dynamic modes respectively. https://www.tensorflow.org/api docs/python/tf/keras/optimizers/sgd https://www.tensorflow.org/api docs/python/tf/keras/optimizers/adam in the absence of external baselines for predicting hate generation probability due to the problem's novelty, we explicitly rely on ablation analyses of the models proposed for this task. for retweet dynamics prediction, we implement external baselines and two ablation variants of retina. since information diffusion is a vast subject, we approach it from two perspectives -one is the set of rudimentary baselines (sir, general threshold), and the other is the set of recently proposed neural models. sir [ ] : the susceptible-infectious-recovered (removed) is one of the earliest predictive models for contagion spread. two parameters govern the model -transmission rate and recovery rate, which dictate the spread of contagion (retweeting in our case) along with a social/information network. threshold model [ ] : this model assumes that each node has threshold inertia chosen uniformly at random from the interval [ , ]. a node becomes active if the weighted sum of its active neighbors exceeds this threshold. using the same feature set as described in section v-a, we employ four classifiers -logistic regression, decision tree, linear svc, and random forest (with estimators). all of these models are used for the static mode of retweet prediction only. features representing exogenous signals are engineered in the same way as described in section iv-d. to overcome the feature engineering step involving combinations of topical, contextual, network, and user-level features, neural methods for information diffusion have gained popularity. while these methods are all focused on determining only the next set of users, they are still important to measure the diffusion performance of retina. topolstm [ ] : it is one of the initial works to consider recurrent models in generating the next user prediction probabilities. the model converts the cascades into dynamic dags (capturing the temporal signals via node ordering). the senderreceiver based rnn model captures a combination of active node's static score (based on the history of the cascade), and a dynamic score (capturing future propagation tendencies). forest [ ] : it aims to be a unified model, performing the microscopic and the macroscopic cascade predictions combining reinforcement learning (for macroscopic) with the recurrent model (for microscopic). by considering the complete global graph, it performs graph sampling to obtain the structural context of a node as an aggregate of the structural context of its one or two hops neighbors. in addition, it factors the temporal information via the last m seen nodes in the cascade. hidan [ ] : it does not explicitly consider a global graph as input. any information loss due to the absence of a global graph is substituted by temporal information utilized in the form of ordered time difference of node infection. since hidan does not employ a global graph, like topolstm, it too uses the set of all seen nodes in the cascade as candidate nodes for prediction. we exercise extensive feature ablation to examine the relative importance of different feature sets. among the six different algorithms we implement for this task, along with different sampling and feature reduction methods, we choose the best performing model for this ablation study. following eq. , we remove the feature sets representing h i,t , s ex , s en , and t (see section iv for corresponding features) in each trial and evaluate the performance. to investigate the effectiveness of the exogenous attention mechanism for predicting potential retweeters, we remove this component and experiment on static as well as the dynamic setting of retina. evaluation of classification models on highly imbalanced data needs careful precautions to avoid classification bias. we use multiple evaluation metrics for both the tasks: macro averaged f score (macro-f ), area under the receiver operating characteristics (auc), and binary accuracy (acc). as the neural baselines tackle the problem of retweet prediction as a ranking task, we improvise the evaluation of retina to make it comparable with these baselines. we rank the predicted probability scores (p ui and p ui j in static and dynamic settings, respectively) and compute mean average precision at topk positions (map@k) and binary hits at top-k positions (hits@k). table iv presents the performances of all the models to predict the probability of a given user posting a hateful tweet using a given hashtag. it is evident from the results that, all six models suffer from the sharp bias in data; without any classspecific sampling, they tend to lean towards the dominant class (non-hate in this case) and result in a low macro-f and auc compared to very high binary accuracy. svm with rbf-kernel outperforms the rest when no upsampling or downsampling is done, with a macro-f of . (auc . ). effects of sampling. downsampling the dominant classes result in a substantial leap in the performance of all the models. the effect is almost uniform over all the classifiers except xgboost. in terms of macro-f , decision tree sets the best performance altogether for this task as . . however, the rest of the models lie in a very close range of . - . macro-f . while the downsampling performance gains are explicitly evident, the effects of upsampling the dominated class are less intuitive. for all the models, upsampling deteriorates macro-f by a large extent, with values in the range . - . . however, the auc scores improve by a significant margin for all the models with upsampling except decision tree. adaboost achieves the highest auc of . with upsampling. dimensionality reduction of feature space. our experiments with pca and k-best feature selection by mutual information show a heterogeneous effect on different models. while the only svm with linear kernel shows some improvement with pca over the original feature set, the rest of the models observe considerable degradation of macro-f . however, svm with rbf kernel achieves the best auc of . with pca. with top-k best features, the overall gain in performance is not much significant except decision tree. we also experiment with combinations of different sampling and feature reduction methods, but none of them achieve a significant gain in performance. ablation analysis. we choose decision tree with downsampling of dominant class as our best performing model (in terms of macro-f score) and perform ablation analysis. table v presents the performance of the model with each feature group removed in isolation, along with the full model. evidently, for predicting hate generation, features representing exogenous signals and user activity history are most important. removal of the feature vector signifying trending hashtags, which represent the endogenous signal in our case, also worsens the performance to a significant degree. table vi summarizes the performances of the competing models for the retweet prediction task. here again, binary accuracy presents a very skewed picture of the performance due to class imbalance. while retina in dynamic setting outperforms the rest of the models by a significant margin for all the evaluation metrics, topolstm emerges as the best baseline in terms of both map@ and hits@ . in figure , we compare retina in static and dynamic setting with topolstm in terms of hits@k for different values of k. for smaller values of k, retina largely outperforms topolstm, in both dynamic and static setting. however, with increasing k-values, the three models converge to very similar performances. figure provides an important insight regarding the retweet diffusion modeling power of our proposed framework retina. our best performing baseline, topolstm largely fails to capture the different diffusion dynamics of hate speech in contrast to non-hate (map@ . for non-hate vs. . for hate). on the other hand, retina achieves map@ scores . and . in dynamic ( . and . in static) settings to predict the retweet dynamics for hate and non-hate contents, respectively. one can readily infer that our wellcurated feature design by incorporating hate signals along with the endogenous, exogenous, and topic-oriented influences empowers retina with this superior expressive power. among the traditional baselines, logistic regression gives comparable macro f -score to the best static model; however, owing to memory limitations it could not be trained on news set larger than per tweet. similarly, svm based models could not incorporate even news items per tweet (memory limitation). meanwhile, an ablation on news size gave best results at for both static and dynamic models. we find that the contribution of the exogenous signal(i.e the news items) plays a vital role in retweet prediction, much similar to our findings in table v for predicting hate generation. with the exogenous attention component removed in static as well as dynamic settings (retina-s † and retina-d † , respectively, in table vi) , performance drops by a significant margin. however, the performance drop is more significant in retina-d † for ranking users according to retweet probability (map@k and hits@k). the impact of exogenous signals on macro-f is more visible in the traditional models. to observe the performance of retina more closely in the dynamic setting, we analyse its performance over successive prediction intervals. figure shows the ratio between the predicted and the actual number of retweets arrived at different intervals. as clearly evident, the model tends to be nearly perfect in predicting new growth with increasing time. high error rate at the initial stage is possibly due to the fact that the retweet dynamics remains uncertain at first and becomes more predictable as increasing number of people participate over time. a similar trend is observed when we compare the performance of retina in static setting with varying size of actual retweet cascades. figure shows that retina-s performs better with increasing size of the cascade. in addition, we also vary the number of tweets posted by a user. figure shows that the performance of retina in both static and dynamic settings increases by varying history size from to tweets. afterward, it either drops or remains the same. our attempt to model the genesis and propagation of hate on twitter brings forth various limitations posed by the problem itself as well as our modeling approaches. we explicitly cover such areas to facilitate the grounds of future developments. we have considered the propagation of hateful behavior via retweet cascades only. in practice, there are multiple other forms of diffusion present, and retweet only constitutes a subset of the full spectrum. users susceptible to hateful information often propagate those via new tweets. hateful tweets are often counteracted with hate speech via reply cascades. even if not retweeted, replied, or immediately influencing the generation of newer tweets, a specific hateful tweet can readily set the audience into a hateful state, which may later develop repercussions. identification of such influences would need intricate methods of natural language processing techniques, adaptable to the noisy nature of twitter data. as already discussed, online hate speech is vastly dynamic in nature, making it difficult to identify. depending on the topic, time, cultural demography, target group, etc., the signals of hate speech change. thus models like retina which explicitly uses hate-based features to predict the popularity, need updated signaling strategy. however, this drawback is only evident if one intends to perceive such endeavors as a simple task of retweet prediction only. we, on the other hand, focus on the retweet dynamics of hateful vs. non-hateful contents which presumes the signals of hateful behavior to be well-defined. the majority of the existing studies on online hate speech focused on hate speech detection, with a very few seeking to analyze the diffusion dynamics of hate on large-scale information networks. we bring forth the very first attempt to predict the initiation and spread of hate speech on twitter. analyzing a large twitter dataset that we crawled and manually annotated for hate speech, we identified multiple key factors (exogenous information, topic-affinity of the user, etc.) that govern the dissemination of hate. based on the empirical observations, we developed multiple supervised models powered by rich feature representation to predict the probability of any given user tweeting something hateful. we proposed retina, a neural framework exploiting extra-twitter information (in terms of news) with attention mechanism for predicting potential retweeters for any given tweet. comparison with multiple state-of-the-art models for retweeter prediction revealed the superiority of retina in general as well as for predicting the spread of hateful content in particular. with specific focus of our work being the generation and diffusion of hateful content, our proposed models rely on some general textual/network-based features as well as features signaling hate speech. a possible future work can be to replace hate speech with any other targeted phenomenon like fraudulent, abusive behavior, or specific categories of hate speech. however, these hate signals require a manual intervention when updating the lexicons or adding tropical hate tweets to retrain the hate detection model. while the features of the end-to-end model appear to be highly engineered, individual modules take care of respective preprocessing. in this study, the mode of hate speech spread we primarily focused on is via retweeting, and therefore we restrict ourselves within textual hate. however, spreading hateful contents packaged by an image, a meme, or some invented slang are some new normal of this age and leave the space for future studies. report of the independent international factfinding mission on myanmar fanning the flames of hate: social media and hate crime a survey on automatic detection of hate speech in text measuring the reliability of hate speech annotations: the case of the european refugee crisis spread of hate speech in online social media deep exogenous and endogenous influence combination for social chatter intensity prediction information diffusion and external influence in networks hateful symbols or hateful people? predictive features for hate speech detection on twitter automated hate speech detection and the problem of offensive language deep learning for hate speech detection in tweets a hierarchically-labeled portuguese hate speech dataset arhnet -leveraging community interaction for detection of religious hate speech in arabic the effects of user features on twitter hate speech detection handling imbalance issue in hate speech classification using sampling-based methods on analyzing antisocial behaviors amid covid- pandemic racism is a virus: anti-asian hate and counterhate in social media during the covid- crisis mind your language: abuse and offense detection for code-switched languages chassis: conformity meets online information diffusion containing papers of a mathematical and physical character deepcas: an end-to-end predictor of information cascades deephawkes: bridging the gap between prediction and understanding of information cascades what makes an image popular? representation learning for information diffusion through social networks: an embedded cascade model a novel embedding method for information diffusion prediction in social network big data neural diffusion model for microscopic cascade prediction topological recurrent neural network for diffusion prediction multi-scale information diffusion prediction with reinforced recurrent networks hierarchical diffusion attention network a survey of information cascade analysis: models, predictions and recent advances predicting user engagement on twitter with real-world events ccnet: extracting high quality monolingual datasets from web crawl data event triggered social media chatter: a new modeling framework demarcating endogenous and exogenous opinion diffusion process on social networks a deterministic model for gonorrhea in a nonhomogeneous population distributed representations of sentences and documents attention is all you need news-please: a generic news crawler and extractor xgboost: a scalable tree boosting system adam: a method for stochastic optimization maximizing the spread of influence through a social network key: cord- -vmmme y authors: shen, meng; wei, yaqian; li, tong title: bluetooth-based covid- proximity tracing proposals: an overview date: - - journal: nan doi: nan sha: doc_id: cord_uid: vmmme y large-scale covid- infections have occurred worldwide, which has caused tremendous impact on the economy and people's lives. the traditional method for tracing contagious virus, for example, determining the infection chain according to the memory of infected people, has many drawbacks. with the continuous spread of the pandemic, many countries or organizations have started to study how to use mobile devices to trace covid- , aiming to help people automatically record information about incidents with infected people through technologies, reducing the manpower required to determine the infection chain and alerting people at risk of infection. this article gives an overview on various bluetooth-based covid- proximity tracing proposals including centralized and decentralized proposals. we discussed the basic workflow and the differences between them before providing a survey of five typical proposals with explanations of their design features and benefits. then, we summarized eight security and privacy design goals for bluetooth-based covid- proximity tracing proposals and applied them to analyze the five proposals. finally, open problems and future directions are discussed. t he coronavirus disease of , referred to as covid- , has become a global pandemic and caused tens of millions of infected people and hundreds of thousands of death. the large-scale virus infection has caused tremendous impact on people's livelihood and the economy of many countries. many countries have to shut down cities to restrain the development of the pandemic and prevent people from working and traveling. therefore, how to effectively curb the spread of covid- has become one of the focuses of researches. traditionally, to trace people who may be at risk of infection, the infected person needs to actively recall where they have been and who they have contacted during the infection period. experts trace the people at risk of infection by constructing a relationship network and isolate them to cut off the source of infection. however, relying on the memory of the infected person is likely to miss key information. when the infected person went to a place where there were lots of people gathered, he/she could not enumerate those strangers who had come into close contact with him/her, which made it difficult for experts to analyze. m. shen and y. wei are with the school of computer science, beijing institute of technology, beijing , china (e-mail: shenmeng@bit.edu.cn, weiyaqianbit@foxmail.com). t. li is with labs, huawei. shenzhen, , china (e-mail: li.tong@huawei.com). t. li is the corresponding author (e-mail: li.tong@huawei.com). after the covid- outbreak, many countries or organizations have begun to study the use of technological means to trace people who may be infected and deployed applications accordingly. these applications are expected to reduce the labor required to determine infection chains and improve the accuracy of tracing virus infections. there are already dozens of covid- tracing applications. due to the inevitable need to collect certain user information, how to protect their security and privacy has become the focus of researchers. the tracing applications can be divided into three categories based on the data collected: location data, proximity data and mixed data that includes the former two. location data can be obtained by using global positioning system (gps) to identify user's latitude and longitude, while proximity data can be obtained by using the bluetooth function on the mobile device. bluetooth classifies close contacts with a significantly lower false positive rate than gps, especially in indoor environments, and it consumes lower battery [ ] . these bluetooth-based applications are basically created based on five bluetoothbased covid- proximity tracing proposals. in this article, we focus on the bluetooth-based covid- proximity tracing proposals, which can be divided into centralized proposals and decentralized proposals. firstly, we summarized the basic workflows of the two categories of proposals and the differences between them. then we specifically analyzed two decentralized and three centralized proposals' generation algorithms of anonymous ids, locally stored data, uploaded data and so on. moreover, we summarized eight security and privacy design goals of proximity tracing proposals and analyzed the five proposals according to them. we found that none of them has achieved the goals. finally, we shed light on open problems and opportunities of bluetooth-based covid- proximity tracing proposals. with the continuous spread of the covid- all over the world, many countries or organizations have successively announced bluetooth-based proximity tracing proposals. the following are five typical proposals. in asia, singapore announced a privacy preserving protocol called bluetrace [ ] . in europe, the pan-european privacy preserving proximity tracing project, referred to as pepp-pt [ ] , comprises more than members across eight european countries. frances inria and germanys fraunhofer, as members of pepp-pt, shared a robust and privacy-preserving proximity tracing protocol, referred to as robert [ ] . the decentralised privacy-preserving proximity tracing proposal, referred to as dp- t [ ] , is an open protocol that ensures personal data and computation stay entirely on an individuals phone, and this proposal was produced by a team of members from across europe. in north america, under the influence of dp- t [ ] , google and apple announced a two-phase exposure notification solution, referred to as gaen [ ] . in the first phase, they released application programming interfaces (apis) that allow applications from health authorities to work across android and ios devices. in the second phase, this capability will be introduced at the operating system level to help ensure broad adoption [ ] . many applications are created based on these five proposals. based on bluetrace, singapore deployed the application called tracetogether, which is the world's first bluetooth-based proximity tracing system deployed nationwide. the covidsafe application was also created based on bluetrace and announced by the australian government. pepp-pt has been implemented in germany and they deployed the application called ntk. the french government has deployed the stopcovid application based on robert to trace covid- . ketju based on dp- t was trialed in finland and it's among the first to use a decentralised approach to proximity tracing based on dp- t in europe. according to the role of the server in the proximity tracing proposals, bluetooth-based proximity tracing proposals can be divided into two categories. one is centralized proximity tracing proposals, such as bluetrace of singapore, pepp-pt of europe and robert of france. the other is decentralized proximity tracing proposals, such as gaen and dp- t of europe. figure (a) (b) shows the workflows of centralized and decentralized proposals, respectively. in the centralized proximity tracing proposals, users broadcast and receive encounter information (anonymous id, transmission time, etc.) via bluetooth. when users are infected with covid- , they can upload the encounter information to a central server, which analyzes the encounter information and determines whether any related user is at risk of infection and notifies them. the server plays a vital role in the workflow of centralized proposals and can handle the encounter information between users and analyze it. in the decentralized proximity tracing proposals, when users are infected with covid- , the keys related to the generation of anonymous ids is uploaded to the server. then the server simply passes the keys of these positive users to other users, who regenerate anonymous ids and analyze whether they are at risk of infection. the server only plays the role of storing and distributing keys uploaded. the difference analysis between centralized and decentralized proposals is shown in table i . the back-end server using the centralized proposals handles each user's pseudonym (unique pseudo-random identifier) and encounter information. the weakness is that it can associate all the anonymous ids of each user with his pseudonym. this allows operators of back-end servers to monitor user's behaviors. the centralized tracing proposals have been strongly criticized by privacy advocates and other stakeholders in the technical community, who believe that the centralized tracing proposals provide the government with information that can be used to reverseengineer personal information about individuals [ ] . singapore and italy have stated that they will switch from centralized applications to decentralized applications. the issue of trust has also prompted the german government favoring a centralized proposal before to adopt a decentralized one. the french parliament debated similar concerns. in this section, the two decentralized proximity tracing proposals (table ii introduces gaen and three designs of dp- t) and the three centralized proximity tracing proposals (table iii introduces bluetrace, pepp-pt and robert) are analyzed. the two decentralized proposals have roughly similar processes, and the specific difference is reflected in the different algorithms for generating anonymous ids. in the low-cost design, the seed keys of one user are linkable. in the formula, h represents the hash function and t represents the current day, the seed key of which can be hashed to generate that of the next day. thus only the seed key of the first day is needed to generate all the anonymous ids for the next few information obtained by the server all the user pseudonyms, anonymous ids and encounter information uploaded by the users tested positive all the keys uploaded by users tested positive the role that the server plays analyze the information obtained and determine whether the related users may be at risk of infection store and distribute the keys the data volume communicated between the mobile device and the server the data uploaded by the users tested positive is small the server needs to periodically distribute keys to all the users, which means the data volume greatly exceeds that of the centralized proposals days. anonoid represents an anonymous id. prf is a pseudorandom function. prg is a pseudorandom generator. str is a fixed, public string. each seed key can be used to generate all the anonymous ids of the day. in the formula of the unlinkable design, epochs i are encoded relative to a fixed starting point shared by all the entities. left takes the leftmost bits of the hash output. this design generates a seed key for each epoch i and hashes it to generate anonymous ids. thus, all of these seek keys are unlinkable. the user can choose the time period for uploading, then the server regenerates these hash values based on the seed key uploaded by the user and puts them into a cuckoo filter before sending them to other users. compared with the low-cost design, the unlinkable design provides better privacy attributes with increased bandwidth. the hybrid design uses a time window w, whose length is an integer multiple of the anonymous id's valid period, to reduce the valid period of a seed key. the user can also select the time period and time window w for uploading. compared with low-cost designs, this design requires more bandwidth and storage space, but less than that of the unlinkable design. gaen is similar to the hybrid design of dp- t. it corresponds to the case where the time window of the hybrid design is one day but has been upgraded in the generation algorithm of anonymous id. in the formula, secseed represents secondary seek key and priseed represents primary seek key. hkdf is a key derivation algorithm. it first generates a primary seed key every day that is unassociated with each other. then it uses the primary seed key to generate a secondary seed key, which is used to generate an anonymous id. the servers in the three centralized proposals all grasp user pseudonyms, anonymous ids and encounter information uploaded. bluetrace needs to collect user's phone number and associate the number with user's pseudonym. their differences are also mainly reflected in the different algorithms for generating anonymous ids. in addition to generating an anonymous id using a key known only to itself, the server in robert also uses the anonymous id and a key known only to itself to generate encrypted country code to implement the proposal across the country. in these proposals, the server plays an important role. in the decentralized proposals, the users tested positive store the encounter information broadcast by other users on mobile device, and upload the keys that generate the anonymous ids. while in the centralized proposals, the users tested positive store and upload the encounter information broadcast by other users. the server handles users pseudonyms and anonymous ids. so as long as a user uploads the encounter information, the server can infer whether there are related users at risk of infection. in the decentralized proposals, because the server is responsible for storing and distributing data, users need to upload keys so that other users can acquire keys and regenerate anonymous ids to match. dp- t proposes three different decentralized designs with different bandwidth and privacy requirements. the low-cost design requires the minimum bandwidth and provides the weakest privacy. the unlinkable design requires the maximum bandwidth and provides the strongest privacy. and the bandwidth and privacy of the hybrid design is between the lowcost design and the unlinkable design. gaen is similar to the hybrid design of dp- t, but it has a better anonymous id generation algorithm. in the above three centralized proposals, the information collected from users and the anonymous id generation algorithm are different. bluetrace needs to collect the user's phone number, while robert does not need. robert uses a secret key known only to the server to encrypt the country/region code as part of encounter information, while the other two proposals do not. in decentralized proposals, a user uploads keys related to his/her own anonymous ids to the server. but in centralized proposals, the user uploads the encounter information related to other users' anonymous ids to the server. this section summarizes the security design goals required for the bluetooth-based proximity tracing proposals based on six types of threats proposed in the stride threat model of microsoft [ ] and analyzes the security of five proposals. ) security design goals: eight security design goals are as follows. information confidentiality. attackers cannot obtain information transmitted by users through wireless communication. information integrity. when transmitting and storing encounter information, these proposals should ensure that the information is not tampered by unauthorized entities or can be discovered afterwards. normal reception. a user can normally receive the information broadcast by the other users after granting the application permission. anonid w, || · · · ||anonid w,n = p rg(p rf (seed w , str)) mobile device mobile device the server does not know processing of big data. the application works normally when it receives a large amount of encounter information. avoidance of false contact. only when two users have close contact can they receive the information broadcast by each other. real identity. an attacker cannot claim to be a certain user. authorization. the user tested positive needs authorization or identity verification before uploading data to the server. non-repudiation. users cannot deny that they have had close contact with someone. ) security analysis of proposals: the analysis of these five proposals' achievement of the security design goals is as follows. information confidentiality. in all the proposals, a user broadcasts information to the other nearby users via bluetooth. in this process, attackers can use tools, such as sniffer, to obtain massages broadcast by users. but attackers cannot obtain valid information by analyzing these messages due to using the generation algorithm of anonymous ids. in decentralized proposals, only users who may be at risk of infection can do risk calculation. in centralized proposals, only servers can decrypt the encounter information and obtain confidential information about users. information integrity. in the two decentralized proposals, if the seed keys uploaded by users tested positive are tampered, other users cannot regenerate real anonymous ids based on the false keys. in gaen, anonymous ids and associated encrypted metadata (aem) are both encrypted. if they are tampered, the encounter information regenerated by users based on the real seed keys cannot match them. in dp- t, if the anonymous ids or the hash value of the anonymous ids in the encounter information is tampered, the anonymous ids regenerated based on the seed keys cannot match them. in pepp-pt, the anonymous ids in the encounter information are generated by the periodically changing seed keys. when the user pseudonym decrypted is invalid, it can be determined that the information has been tampered. in bluetrace, there are fields for integrity checking in the anonymous ids. in robert, the message authentication code (mac) in the encounter information can be used to check the integrity. normal reception. any proximity tracing system based on bluetooth low energy (ble) is vulnerable to active attackers. this attack may cause the normal recording of anonymous ids to stop working, thereby preventing a user from discovering the other users. this is an inherent problem with this method. processing of big data. when an attacker sends a large amount of encounter information to a user, the user's application may occupy too much memory to store the information, which may cause the application to crash. to solve this problem, the storage capacity can be set for the encounter information, but this method will also cause the application to be unable to receive more encounter information after the encounter information fills up the memory. none of these five proposals can deal with this problem. avoidance of false contact. for all the proposals, false contact incidents cannot be completely avoided. the attacker can record the information broadcast by a user and broadcast it to victims as quickly as possible. if the user is later tested positive, the victims will mistakenly believe that they are in danger. technically savvy attackers can use large antennas to artificially increase their broadcast range. for attackers without budget restriction, they may relay and broadcast anonymous ids extensively to create large-scale false contact incidents. all the proposals resist these attacks to the greatest extent by limiting the validity period of anonymous ids but it cannot solve this problem completely. real identity. all the proposals use specific encryption algorithms to prevent the attacker from deriving seed keys or user pseudonyms based on the collected anonymous ids, so the attacker cannot pretend to be a certain user. authorization. in all the proposals, users infected with covid- can upload data to the server only after being authorized by health authorities. non-repudiation. in the centralized proposals, the server handles user pseudonyms, anonymous ids generated based on the user pseudonyms and encounter information uploaded. when two users have proximity contacts, they will send encounter information including anonymous ids to each other. when one user is tested positive and uploads encounter information to the server, the other user cannot deny the proximity contact with him/her because the sever can get another user's pseudonym from the anonymous id in the encounter information. in the decentralized proposals, if one user is tested positive and uploads keys to the server and another user gets the keys, regenerates and matches the anonymous ids successfully, the user cannot deny the proximity contact with another user because the keys are only known to him/her. based on above analysis, we listed the achievement of the five proximity tracing proposals for eight security design goals, as shown in table iv . it can be seen that none of the five proposals can achieve the security design goals of normal reception, processing of big data and avoidance of false contacts. and all can achieve the design goals of confidentiality, integrity, real identity, authorization and non-repudiation. this section summarizes the eight privacy design goals required for the bluetooth-based proximity tracing proposals based on six data protection principles of general data protection regulation (gdpr) of the european union [ ] and analyzes the privacy of the five proposals. ) privacy design goals: the eight privacy design goals are as follows. right of access. users shall have the right to obtain confirmation as to whether or not personal data concerning them are being processed, and where that is the case, access to the personal data and the following information: the purposes of the processing, the categories of personal data concerned, the period that personal data will be stored etc. data minimisation. adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed. right to erasure. users shall have the right to obtain the erasure of personal data concerning them without undue delay and the system shall have the obligation to erase personal data without undue delay. storage limitation. save the data for necessary limited time and then erase it. untraceability. users' locations cannot be exposed based on the information broadcast by them. protection of infected users. the identity of infected users should not be exposed to unauthorized entities. protection of risky users. the identity of risky users should not be exposed to unauthorized entities. protection of interaction information. the interaction information which reflects close-range physical interactions between users should not be exposed to unauthorized entities. ) privacy analysis of proposals: the analysis of the five proposals' privacy design goals is as follows. right of access. in all proposals, the applications provide introduction to users before they use specific functions. they inform users the permission users should grant, the purpose of collecting the data, the data they will collect, the period that personal data will be stored etc. data minimisation. all of the five proposals indicated that the location information of mobile phones would not be collected. the processing amount of personal data is limited to the minimum amount of data required by the system, and no unnecessary personal data is collected right to erasure. in the five proposals, users have the right to stop using the applications and delete personal data at any time. storage limitation. all of the five proposals limit the number of days the data can be kept. once the data expires, it will be deleted, ensuring the accuracy of the stored data. untraceability. in the two decentralized proposals, since users use encryption algorithms to generate anonymous ids that change periodically, other entities cannot link to users by analyzing anonymous ids they broadcast. in the three centralized proposals, other entities cannot link to user pseudonyms by analyzing anonymous ids unless they have encryption keys. but encryption keys are only handled by servers. consequently, other entities cannot link to users by analyzing the anonymous ids they broadcast. protection of infected users. the data uploaded by the infected user is not related to personal information. in the decentralized proposals, the data uploaded is keys, and in the centralized proposals, it is the encounter information. however, attackers can determine that the user is uploading a large amount of data to the server by tracing phone numbers of health authorities or observing the network traffic, inferring that the user has been tested for covid- and diagnosed. these attackers can be internet service providers (isps), network operators, or hackers who set up malicious access points or sniff public wifi networks. protection of risky users. in the two decentralized proposals, the server sends seed keys uploaded by the infected users to the other users, who use the seed keys locally to regenerate the anonymous ids and calculate the risk score. these seed keys are not associated with the identity of the user at risk of infection, so the decentralized proposals will not disclose information about the user at risk of infection to the others. in pepp-pt and robert, all the users will periodically request to the server to update the risk score. the format of the reply is the same regardless of whether the user has a risk of infection. therefore, if the server is credible and the communication channel is confidential, the eavesdropper cannot distinguish which user has a risk of infection. in bluetrace, because the server uses user's phone number to notify them of the risk of covid- infection, the information of the user with high risk may be leaked through the attacker's tracking of the health authority's phone number. protection of interaction information. in the two decentralized proposals, the system will not disclose any information about the interaction between two users to any entity. the anonymous ids derived from the keys uploaded by an infected user has nothing to do with whoever had interacted with this user. in the three centralized proposals, only the server can learn an infected user's interaction information by analyzing the encounter information uploaded by the user. if the server is trusted, the other unauthorized parties will not learn about these interaction information. based on the above analysis, we listed the achievement of the five proximity tracing proposals for the eight privacy design goals, as shown in table v . it can be seen that all the proposals have achieved the same privacy in terms of right of access, data minimisation, right to erasure, storage limitation, untraceablility and protection of interaction information. and none of them achieves the privacy goal of protecting infected users. for the privacy design goal of protecting risky users, since bluetrace needs to collect users phone numbers, it may disclose the information of users at risk of infection. in terms of privacy analysis, bluetrace achieves less privacy design goals than the other proposals. at present, the research on bluetooth-based proximity tracing proposals is still in the stage of continuous exploration, and researchers face many challenges during this process. precise proximity measurement and risk calculation are the key steps in tracing covid- . gaen standardizes four scores which are attenuationscore, dayssincelastexpo-surescore, durationscore and transmissionriskscore, and then it multiples these scores to calculate the risk value of infection. swisscovid, which is based on gaen, divides the attenuation into three intervals by using two attenuation values before assigning different weight values to each interval. it uses gaen api to request users continuous attenuation time in different intervals and get the risk value of infection by calculating the weighted sum of continuous attenuation time. gaen is still evolving, and the measurement and calibration between different operating systems and different mobile phone models of these parameters such as attenuation values, continuous contact time, thresholds and weights are still not completed. to accurately estimate the distance between two users, gaen released a bluetooth low energy rssi calibration tool to calibrate as many devices as possible. it collects the rssi correction and the transmit power of different mobile phone models to improve the calculation consistency of attenuation values of all devices [ ] . gaen currently uses this rough [ ] and tried it out on the isle of wight, but many technical challenges have been identified through system testing. measuring the distance between users may consider the mutual enhancement of bluetooth and ultrasonic ranging. in bluetooth-based proximity tracing proposals, mobile devices broadcast anonymous ids using bluetooth low energy (ble), in which the attenuation of bluetooth signals is generally used to indirectly represent the distance between users. in addition, ultrasound is also a way to measure distance, which is more accurate and does not depend on special hardware. in a scenario where the distance between users is greater than the officially considered safe distance, the attenuation of the bluetooth signal can be used to represent the distance between users, because this scenario does not require an accurate distance measurement. when the users are in close contact, for example, when the two users perform handshake and other close actions, ultrasonic assisted ranging perhaps can be triggered as needed at this time to provide calibration for distance measurement. the researchers conducted experiments on the privacy and security risks of gaen in the real world. in this experiment, the researchers proved that the current framework design is vulnerable to two kinds of attacks [ ] . one attack is to profile the infected person and possibly de-anonymize it. the researchers used mobile devices as a bluetooth sniffer to capture anonymous ids broadcast by them passing through six locations. the captured data appeared random and could not be associated with a single user. however, after a user is tested positive and continuously uploads the primary keys, the result is completely different. by generating a users anonymous ids and matching with the anonymous ids received by the bluetooth sniffers at six locations, they can accurately know which locations the user has visited, and the users route map can be portrayed based on the time information. thus, they can collect a lot of information about the user and cancel its anonymity. because the code of gaen is not open source and the api can only be used by health authorities, an analog tracker that conforms to the anonymous id encryption specification in the gaen api is used in this experiment. another type of attack is a relay-based wormhole attack, in which an attacker constructs a fake contact event and may seriously affect a tracing application built on gaen. the researchers built a multi-location wormhole by integrating bluetooth low energy (ble) and the raspberry pi. first, the worm device sends the encounter information collected from a location to the central message queuing telemetry transport (mqtt) server. the server distributes the received messages among the worm devices. these devices will copy the beacon within the validity period of the anonymous id ( minutes) and rebroadcast. finally, the researchers established a logical connection between the mobile devices kilometers away, but in fact they did not have real contacts. this wormhole attack budget is relatively low, and attackers can use higher-than-normal signal strength and/or high-gain antennas to significantly increase the scope of each wormhole device. therefore, an attacker may establish false connections between a large number of users and expand the number of people who need to be tested and isolated, causing unnecessary panic. because gaen is unavailable, the researchers used dp- t that inspired gaen as an alternative. all the covid- tracing applications designed based on gaen are vulnerable to these two attacks. for the centralized proposals, since the server handles every users pseudonym, each user can be monitored. thus it is necessary to ensure that the server is credible and will not disclose information. to promote the progress of these proposals in terms of security and privacy, governments or research institutions can open-source their proposals and use everyone's power to find a better evolutionary direction. gaen develops a bluetooth-based proximity tracing system on android and ios platforms to improve the security and privacy of the bluetooth function used in the proposal, but the framework may not be available on other platforms [ ] . the european countries believe that apples mobile phones restrict the use of bluetooth background scanning by third-party applications. their users mobile devices need to keep bluetooth on and active at all times, which will negatively affect battery life and device availability, making their own proposals impossible and turning to gaen to build applications. in addition, when performing accurate proximity measurements based on radio signal strength, devices with different technical characteristics need to be considered. to ensure the validity of the proximity tracing applications, a large number of users must download these applications and grant application permissions. when only a small number of people choose to use tracing applications, none of these proposals can play their true role. to protect the privacy of users, users infected with covid- can decide whether to upload data to the server in the five proposals mentioned above. if only some users choose to upload data, these proposals cannot effectively trace covid- . the applications are unavailable in areas lacking the rd or th generation mobile communication technologies. and for the elderly, children, and people with difficult family conditions, they may not have qualified mobile devices and consequently cannot use these applications. for example, the smartphone penetration in india and bangladesh is very low, which is . % and . %, respectively [ ] . moreover, in some countries or regions that are highly concerned about personal privacy, the security and privacy risks of the applications are also a reason that prevents people from using them. the government should increase publicity efforts for such type of applications on the basis of protecting users safety and privacy and try to implement this function on other portable devices to reduce the threshold for using them [ ] . in any country, ensuring a strong covid- infection test capability is the basis for preventing the spread of covid- . health authorities must be able to test accuratly whether people are infected with covid- on a large scale so that these auxiliary tracing proposals can achieve their functions. if users fail to test in time and get accurate test results when they are informed of the risk of covid- infection, their enthusiasm for using such applications will be reduced. the government should take responsibility of testing covid- for the public, providing convenient and affordable testing approaches for them. with the global pandemic of covid- , how to use technology to assist in tracing and suppressing the spread of covid- has become one of the focuses of researchers. this article gives an overview on bluetooth-based covid- proximity tracing proposals. we categorized the protocols into two categories and summarized the differences between them. then we specifically analyzed the five protocols and summarized their features and benefits. for a deeper comprehension, we summarized eight security and privacy design goals of proximity tracing proposals and analyzed the five proposals' achievement of these goals. we found that none of them has achieved the design goals. moreover, we shed light on the numerous open issues and opportunities that need further research efforts from the technical requirements and community building perspectives. bluetrace: a privacy-preserving protocol for community-driven contact tracing across borders pepp-pt. pan-european privacy-preserving proximity tracing robert: robust and privacy-preserving proximity tracing decentralized privacy-preserving proximity tracing exposure notifications: using technology to help public health authorities fight covid- faqs for exposure notifications covid- digital contact tacing applications threat modeling: designing for security the european union general data protection regulation: what it is and what it means exposure notifications ble attenuations covid- app documentation mind the gap: security & privacy risks of contact tracing apps covid- contact tracing: challenges and future directions digital contact tracing for covid- china in , both in computer science. currently he serves in beijing institute of technology, beijing, china, as an associate professor, the school of computer science, beijing institute of technology. his research interests include privacy protection for cloud and iot, blockchain applications, and encrypted traffic classification. he received the best paper runner-up award at ieee ipccc currently she is a master student in the school of computer science, beijing institute of technology he is currently a senior researcher in the computer network and protocol lab at huawei. his research interest includes network protocols, security and measurements key: cord- -qp kq authors: klopfenstein, lorenz cuno; delpriori, saverio; francesco, gian marco di; maldini, riccardo; paolini, brendan dominic; bogliolo, alessandro title: digital ariadne: citizen empowerment for epidemic control date: - - journal: nan doi: nan sha: doc_id: cord_uid: qp kq the covid- crisis represents the most dangerous threat to public health since the h n influenza pandemic of . so far, the disease due to the sars-cov- virus has been countered with extreme measures at national level that attempt to suppress epidemic growth. however, these approaches require quick adoption and enforcement in order to effectively curb virus spread, and may cause unprecedented socio-economic impact. a viable alternative to mass surveillance and rule enforcement is harnessing collective intelligence by means of citizen empowerment. mobile applications running on personal devices could significantly support this kind of approach by exploiting context/location awareness and data collection capabilities. in particular, technology-assisted location and contact tracing, if broadly adopted, may help limit the spread of infectious diseases by raising end-user awareness and enabling the adoption of selective quarantine measures. in this paper, we outline general requirements and design principles of personal applications for epidemic containment running on common smartphones, and we present a tool, called 'diary' or 'digital ariadne', based on voluntary location and bluetooth tracking on personal devices, supporting a distributed query system that enables fully anonymous, privacy-preserving contact tracing. we look forward to comments, feedback, and further discussion regarding contact tracing solutions for pandemic containment. the novel coronavirus sars-cov- and its rapid spread have established a pandemic of global proportions over the course of the first months of . high fatality rates detected in the first affected regions are expected to be even higher in countries with an older population, low-income, or lack of suitable health-care facilities [ ] . in the absence of a viable vaccine, so far, the spreading of the disease due to the sars-cov- virus has been countered in many countries with countermeasures that attempt to suppress epidemic growth, thus avoiding to overwhelm the healthcare system with an unmanageable number of patients. the reduction of contagion is achieved through a set of increasingly severe measures that limit personal freedom and entail strong socio-economic drawbacks, going well beyond mass gathering prohibition and case isolation. social distancing rules, including school and university closure, household quarantine, internal and cross-border mobility constraints, and selective closure of non-essential productive and commercial activities, have brought many countries to complete lockdown [ ] . all these approaches require quick adoption and strict enforcement, in order to effectively curb virus spread in the short term. as observed in the pandemic, there is a strong correlation between excess mortality and earliness of containment measures. containment interventions that are introduced too late or lifted too early were shown to have very limited effect [ ] . in the long term, governments are required to trade off the adoption of dramatic full lockdown measures with more lax interventions. for instance, progressively adopting temporary small-scale contagion suppression actions that aim at keeping the virus' reproduction number, r , at a level that does not exceed the healthcare system's capacity. adaptive adoption of these kinds of containment policies at a regional level is expected to be effective even if enforced for shorter periods of time [ ] . the triggering of circumscribed quarantine measures can be directed through the widespread adoption of technological tools that allow tracing contacts and interactions with known cases of contagion [ ] . mobile apps running on personal smartphones are especially attractive as solutions because they enable immediate deployment on existing hardware and a quick response [ ] . several approaches of this kind have been proposed over the course of the last weeks, giving raise to a growing debate around privacy implications and potential risks of mass surveillance and stigmatization [ , ] , that have prompted authorities to provide recommendations and guidelines [ ] , and big players to develop ad hoc cross-platform protocols [ ] . in this paper, we suggest that these contact tracing tools should be designed to sup-port end-user empowerment, as opposed to mass surveillance, granting citizens more data, awareness, and control, as envisioned by nanni et al. [ ] . in section , we outline the basic requirements and the founding principles on which they should be based. in section , we present a location/contact tracing solution, composed of a mobile app and a distributed query system, designed to meet these critical requirements. the proposed system allows individuals to keep track of movements and contacts on their own private devices and to use local traces to select relevant notifications and alerts from health authorities, thus completely eschewing, by design, any risk of surveillance. taking end-user empowerment as the founding principle, in this section we outline requirements and design principles that address both regulatory and technical issues. compliance with national and international regulation is essential to protect natural persons and their fundamental rights and freedoms, while technical requirements are mainly meant to reconcile dependability needs with the features of general-purpose personal devices, characterized by software fragmentation, hardware diversity, variety of non-exclusive usage modes, lack of calibration, limited resources, and untrained users. a. collective intelligence. systems based on the voluntary participation of individuals, performing a collective effort in their pursuit of a common goal, leverage a form of collective intelligence, which is the only alternative to mass surveillance and enforcement. ict solutions should support and encourage such collaborative behaviors. b. social responsibility. individual participation in a collective effort towards a common goal is an act of social responsibility. the technology adopted must make the social value of end-user's behavior clearly perceptible. c. awareness and control. technology is not infallible and systems may not always behave as expected. mobile apps should not induce end-users to simply rely on them. rather, they should empower end-users by granting them control and awareness of the data gathering process and by allowing them to browse their data and possibly add spontaneous notes. d. privacy and anonymity by design. protection of sensitive data, such as locations or health-related information, cannot rely exclusively on trust or security promises. the system must be designed to keep user data private at all times, ideally storing them exclusively on the user's device, and to make identification impossible a posteriori. e. technology agnosticism. contact tracing is a challenging task. in spite of the many approaches that have been proposed, no single technology has proven to offer the ultimate solution. for instance, location services have limited accuracy, especially indoor, while bluetooth proximity does not reveal the exposure to indirect contagion (through infected surfaces). the solution of choice should exploit all available technologies and be open to any improvement or integration. f. effectiveness. the effectiveness of containment measures based on the voluntary adoption of a mobile app strongly depends on the percentage of the population, making proper use of that app. although this is always true, each solution has to be evaluated in different scenarios, including those well below the nominal critical mass of the target technology. two types of performance indicators have to be used, to measure the support that app can provide both to end-users tested/diagnosed positive to covid- , willing to cooperate with health authorities, and to all other individuals possibly infected by them, who should take timely countermeasures. g. interoperability. interoperability must be pursued as much as possible, in order to reduce the critical mass requirements of each single system and to fully exploit their potential. to this purpose, open standard protocols should be preferred to closed ad hoc ones, cooperation among institutions must be technically supported, and integration with synergistic healthcare systems must be enabled. h. openness of source code. open-source access to all system components is the key to speed up development, ensure continuous improvement, and guarantee coherence between specification and implementation. transparency is essential both for end-users and for health authorities possibly adopting the solution. i. openness of statistical data. statistical data can provide valuable information to evaluate the effectiveness of epidemic containment, to monitor contagion, and to drive timely decisions. all statistical information that can be provided by end-users on a voluntary basis, without jeopardizing their privacy and anonymity, is worth being gathered and made available as an open dataset. open data enables study and research without providing questionable competitive advantages to any player. j. avoidance of false alarms. the ultimate goal of contact tracing systems is to reach susceptible or asymptomatic individuals who are considered to be the target of specific measures (e.g., quarantining or testing) according to the containment policies adopted. the solution adopted must minimize unneeded alarms that can overwhelm the healthcare system and spread panic. l. scalability. the higher is the adoption rate, the more effective the solution is. hence, scalability is a key requirement. since the target devices, i.e., smartphones, have their own storage, computation, sensing, and communication resources, scalability can be inherently achieved by exploiting local resources as much as possible without triggering any network effect. digital ariadne or 'diary' is a privacypreserving open-source tool, developed by digit srl and the university of urbino, that allows users to trace their movements and contacts, while also allowing governments or healthcare agencies to rapidly direct their epidemic containment efforts, in a way that aligns with the principles outlined above. the system is composed of: a mobile application, that is voluntarily installed by users on their smartphones, keeping track of their locations through the device's gps sensor and interactions with other users through bluetooth radio beacons, a privacy-aware reward system, which incentivizes app usage while collecting anonymous usage information to feed an open data set, and a distributed query system that allows recognized public authorities to selectively and anonymously notify users about possible contagion sources. the mobile app works in background, with careful usage of battery and storage and without impairing the functioning of the personal devices. nonetheless, it provides a rich user interface to make endusers fully responsible and aware of their own contribution to epidemic containment. source code of the mobile applications and the back-end service, both currently in active development, is available on github . the digital ariadne mobile application is developed using the flutter framework for apple ios and google android. a combination of movement detection with the built-in accelerometers, activity recognition, data from gps sensors, and bluetooth low energy (ble) transmission is used to adaptively track the user's movements and interactions without negatively impacting battery and storage capacity of the device. traces are collected autonomously by a background service launched by the application, but end-users can decide at any time to interact with the app to browse stored data, to force sampling, to mark known locations or the add notes. tracking status is displayed on the main app screen, as shown in figure . three concentric circles, representing the hours of the current day, show the detected movements, the amount of time spent in known locations, and the notes added by the end-user. this information is stored for a maximum duration of days on the device. the app approximately requires mb of data to store information about a day of full tracking (this requirement may slightly increase in case of frequent movement). the app, once activated by the user, starts tracking the device's location and records detected positions and movements. location tracking is adaptive, based on the user's activity and speed, in order to provide sufficient precision in the case of movement and low battery consumption otherwise. the user may voluntarily mark known locations through the app interface, thus allowing to specify places such as home, workplace, school, or any other locations. access to and departure from these locations are detected through geofencing and allow the user to have a quick overview of his or her movements throughout the day. also, the user may decide to add notes to a specific location and time of the day, in order to remember specific events or situations that may be relevant for contact tracing purposes. user movements, known locations, and notes are shown on an interactive map like in figure . the diary app makes use of the temporary contact numbers (tcn) contact tracing protocol in order to broadcast randomly-generated and anonymous identifiers, which are updated every few minutes [ ] . all installations of the app keep a fully-local log of every identifier that has been broadcast and every identifiers that was received from other devices making use of the same tcn protocol. this data, which expires together with location data after days, never leaves the phone and is not linked to private user information. the digital ariadne system makes use of a server-side component that is used to collect daily usage statistics in an anonymous fashion. users are never required to identify themselves and no user-identifying information is transmitted at any point. communication between mobile applications and the back-end makes use of secure connections using widelyadopted standards (https with optional certificate pinning). an anonymous installation id is generated upon the first launch of the mobile application. this id is a randomly generated uuid and is used only to distinguish individual installations for statistical data collection and aggregation. installation ids are not linked to private user information or device characteristics. daily statistics include, for each installation: collected information, made available as open data set, gives an indication of how the mobile app is used, allowing researchers and policy makers to gauge the effectiveness of measures adopted at regional or national level. while personal data never leaves the user's device and collected statistical data cannot be used to identify users, digital ariadne is designed to give designated territorial or national authorities access to the system through a dashboard allowing them to publish epidemicrelated "call to actions". call to actions can be seen as geographical and temporal distributed queries that operate with the following process: (a) an authority creates a new call to action based on a sequence of geolocated and timestamped points and/or a set of temporary contact numbers, (b) the call to action is stored by the back-end service until it expires, (c) the mobile app automatically downloads relevant call to actions, (d) the mobile app matches call to actions to private location and contact data, in order to verify whether the user has been exposed to possible sources of contagion, (e) if there is a match, the user is privately notified and can access information associated to the call to action. matching users may also directly choose to interact with the public authority, optionally disclosing part of their traces. thus, a call to action is composed of a series of geographical regions (geo-polygons), associated time intervals, and a series of temporary contact numbers (tcns). the match is performed by checking whether the user has been within the indicated region in a given time period, or if any tcn is found among local records. sensitivity of the match can be finetuned by the health authority, by indicating a maximum distance from the region and a minimum time interval of match (i.e., exposure) in order to alert the user, with the intent of reducing panic and avoiding unnecessary alarms. . . . creating call to actions in the case of contagion when diary users are positively diagnosed, they may grant to healthcare or government authorities partial access to their local traces, inlcuding geolocations, timestamps, and the list of temporary contact numbers generated by the diary app. this information can be used to generate anonymous calls to action made accessible to all the instances of the app in the interested area. call to actions are processed locally to each installation and displayed to end-users if and only if their traces match filtering criteria. this mechanism enables anonymous tracking of past interactions of diagnosed individuals, alerting potentially infected diary users and prompting them for self-isolation and further testing. to further raise awareness and promote adoption and usage of the application, diary integrates with the 'worth one minute' platform. the platform has adopted diary as an instrument for the common good and rewards users with anonymous vouchers (called woms) for their collaborative behaviour [ ] . these vouchers are intended to provide: (a) a simple gamified experience that allows users to earn points and thus obtain positive feedback of their voluntary contribution to epidemic containment; (b) a tangible currency-like reward that can be adopted as a voucher system at a local and national scale to promote microeconomic growth in a post-lockdown scenario; (c) a perception of the social value of individual actions and behaviours. in this paper, we have argued citizen empowerment to be the foundation on which novel epidemic control technologies must be built as a viable alternative to mass surveillance. general design principles driving the development of such technologies have been outlined and applied to the design of digital ariadne, an open-source privacy-preserving instrument that combines location and contact tracing capabilities to collect local traces that can be crossmatched with authoritative alerts and calls to action without leaving the end-user's device. just like ariadne's thread, the data stored on personal smartphones offers a trusted trace to find a way out of the maze of covid- . we invite any kind of feedback on this whitepaper, including comments on the design principles and technical contributions to the open-source diary project. age-dependent risks of incidence and mortality of covid- in hubei province and other parts of china. medrxiv ferguson. the effect of public health measures on the influenza pandemic in u.s. cities quantifying sars-cov- transmission suggests epidemic control with digital contact tracing feasibility of controlling covid- outbreaks by isolation of cases and contacts support for app-based contact tracing of covid- contact tracing mobile apps for covid- : privacy considerations and related trade-offs commission recommendation on a common union toolbox for the use of technology and data to combat and exit from the covid- crisis, in particular concerning mobile applications and the use of anonymised mobility data apple and google partner on covid- contact tracing technology jeroen van den hoven, and alessandro vespignani. give more data, awareness and control to individual citizens, and they will help covid- containment a global coalition for privacy-first digital contact tracing protocols to fight covid- worth one minute": an anonymous rewarding platform for crowdsensing systems the authors wish to thank the more than beta testers that signed up for testing and the more than users that have provided valuable feedback in the last three weeks. key: cord- - y yisfk authors: chan, justin; foster, dean; gollakota, shyam; horvitz, eric; jaeger, joseph; kakade, sham; kohno, tadayoshi; langford, john; larson, jonathan; sharma, puneet; singanamalla, sudheesh; sunshine, jacob; tessaro, stefano title: pact: privacy sensitive protocols and mechanisms for mobile contact tracing date: - - journal: nan doi: nan sha: doc_id: cord_uid: y yisfk the global health threat from covid- has been controlled in a number of instances by large-scale testing and contact tracing efforts. we created this document to suggest three functionalities on how we might best harness computing technologies to supporting the goals of public health organizations in minimizing morbidity and mortality associated with the spread of covid- , while protecting the civil liberties of individuals. in particular, this work advocates for a third-party free approach to assisted mobile contact tracing, because such an approach mitigates the security and privacy risks of requiring a trusted third party. we also explicitly consider the inferential risks involved in any contract tracing system, where any alert to a user could itself give rise to de-anonymizing information. more generally, we hope to participate in bringing together colleagues in industry, academia, and civil society to discuss and converge on ideas around a critical issue rising with attempts to mitigate the covid- pandemic. several communities and nations seeking to minimize death tolls from covid- , are resorting to mobilebased, contact tracing technologies as a key tool in mitigating the pandemic. harnessing mobile computing technologies is an obvious means to dramatically scale-up conventional epidemic response strategies to do tracking at population scale. however, straightforward and well-intentioned contact-tracing applications can invade personal privacy and provide governments with justification for data collection and mass surveillance that are inconsistent with the civil liberties that citizens will and should expect-and demand. to be effective, acceptable, and consistent with the need to observe commitments to privacy, we must leverage designs and computing advances in privacy and security. in cases where it is valuable for individuals to share data with others, systems must provide voluntary mechanisms in accordance with ethical principles of personal decision making, including disclosure, and consent. we refer to efforts to identify, study, and field such privacy-sensitive technologies, architectures, and protocols in support of mobile tracing as pact (p rivacy sensitive protocols and mechanisms for mobile c ontact t racing). the objective of pact is to set forth transparent privacy and anonymity standards, which permit adoption of mobile contract tracing efforts while upholding civil liberties. the basic idea is that users broadcast signals ("pseudonyms"), while also recording the signals they receive. notably, this colocation approach avoids the need to collect and share absolute location information. credit: m eifler. this work specifies a third-party-free set of protocols and mechanisms in order to achieve these objectives. while approaches which rely on trusted third parties can be straightforward, many naturally oppose the aggregation of information and power that it represents, the potential for misuse by a central authority, and the precedent that such an approach would set. it is first helpful to review the conventional contact tracing strategies executed by public health organizations, which operate as follows: positively tested citizens are asked to reveal (voluntarily, or enforced via public health policy or by law depending on region) their contact history to public health officers. the public health officers then inform other citizens who have been at risk to the infectious agent based on co-location, via some definition of co-location, supported by look-up or inference about locations. the citizens deemed to be at risk are then asked to take appropriate action (often to either seek tests or to quarantine themselves and to be vigilant about symptoms). it is important to emphasize that the current approach already makes a tradeoff between the privacy of a positively tested individual and the benefits to society. we describe mobile contact-tracing functionalities that seeks to augment the services provided by public health officers, by enabling the following capabilities via computing and communications technology: • mobile-assisted contact tracing interviews: a citizen who becomes ill can use this functionality to improve the efficiency and completeness of manual contact tracing interviews. in many situations, the citizen can speed up the interview process by filling in much of a contact interview form before the contact interview process even starts, reducing the burden on public health authorities. the privacy-sensitivity here is ensured since all the data remains on the user's device, except for what they voluntarily decide to reveal to health authorities in order to enable contact tracing. in advance of their making a decision to share, they are informed about how their data may be used and the potential risks of sharing. • narrowcast messages: public health authorities can make available custom-tailored messages to specific, relevant subsets of citizens. for example, the following message might be issued: "if you visited the x eldercare center between march th and th, please email yy@hhhealth.org" or "please refrain from entering playground z until april th because it needs to undergo decontamination." a mobile app can download all of these messages and display those relevant to a citizen based on the app's sensory log or potential future movements. this capability allows public health officials to quickly warn people when new hotspots arise, or canvas for general information. it enables a citizen to be well-informed about extremely local pandemic-relevant events. • privacy-sensitive, mobile tracing: proximity-based signals seem to provide the best available contact sensor from one phone to another; see figure for the basic approach. proximity-based sensing can be done in a privacy-sensitive manner. with the approach, no absolute location information is collected nor shared. variants of proximity-based analyses have been employed in the past for privacy-sensitive analyses in healthcare [ ] . taking advantage of proximity-based signals can speed the process of contact discovery and enable contact tracing of otherwise undiscoverable people like the fellow commuter on the train. this can also be done with a third-party-free approach providing similar privacy tradeoffs as manual contact tracing. this functionality can enable someone who has become ill with symptoms consistent with covid- , or who has received confirmation of infection with a positive test for covid- , to voluntarily and under a pseudonym, share information that may be relevant to the wellness of others. in particular, a system can manage, in a privacy-sensitive manner, data about individuals who came in close proximity to them over a period of time (e.g., the last two weeks), even if there is no personal connection between these individuals. individuals who share information do so with disclosure and consent around potential risks of private information being shared. we further discuss disclosure, security concerns, and re-identification risks in section . importantly, these protocols, by default, keep all personal data on a citizens' phones (aside for pseudonymous identifiers broadcast to other local devices), while enabling these key capabilities; information is shared via voluntary disclosure actions taken, with the understandings relayed via careful disclosure. for example, if someone never tests positive for covid- or tests positive but decides not to use the system, then *no* data is ever sent from their phone to any remote servers; such individuals would be contacted by standard contact tracing mechanisms arising from reportable disease rules. the data on the phone can be encrypted and can be set up to automatically time out based on end-user controlled policies. this would prevent the dataset from being accessed or requested via legal subpoena or other governmental programs and policies. we specify protocols for all three separate functionalities above, and each app designer can decide which ones to use. these protocols notably have different value adoption curves: narrowcast and mobile-assisted contact tracing have a value which is linear in the average adoption rate while privacy-sensitive mobile tracing has value quadratic in the average adoption rate due to requiring both ends of the connection be working. this quadratic dependence implies low initial value so we expect narrowcast and mobile-assisted contact tracing to provide initial value in adoption while privacy-sensitive mobile tracing provides substantial additional value once adoption rates are high. we note that there are an increasing number of concurrent contact tracing protocols being developedsee in particular section for a discussion of solutions based on proximity based tracing (as in figure ). in particular, there are multiple concurrent approaches using proximity based signaling; our approach has certain advantageous properties, as it is particularly simple and requires very little data transfer. one point to emphasize is that, with this large number of emerging solutions, it is often difficult for the user to interpret what "privacy preserving" means in many of these protocols . one additional goal in providing the concrete protocols herein is to have a broader discussion of both privacy-sensitivity and security, along with a transparent discussion of the associated re-identification risks -the act itself of alerting a user to being at risk provides de-anonymizing information, as we discuss shortly. from a civil liberties standpoint, the privacy guarantees these protocols ensure are designed to be consistent with the disclosures already extant in contract tracing methods done by public health services (where some information from a positive tested citizen is revealed to other at risk citizens). in short, we seek to empower public health services, while maintaining civil liberties. we also note that these contact tracing solutions are not meant to replace conventional contact tracing strategies employed by public health organizations; not everyone has phones, and not everyone that has a phone will use this app. therefore, it is still critical to leverage conventional approaches, along with the figure : pact tracing protocol. first, a user generates a random seed, which they treat as private information. then all users broadcast random-looking signals to users in their proximity via bluetooth and, concurrently, all users also record all the signals they hear being broadcast by other users in their proximity. each person's broadcasts (their "pseudonyms") are a function of their private seed, and they change these broadcasted pseudonyms periodically (e.g. every minute). whenever a user tests positive, the positive user then can voluntarily publish, on a public server, information which enables the reconstruction of all the signals they have broadcasted to others during the infection window (precisely, they publish their private seed, and, using the seed, any other user can figure out what pseudonyms the positive user has previously broadcasted). now, any other user can determine whether they are at risk by checking whether the signals they heard are published on the server. note that the "public lists" can be either lists from hospitals, which have confirmed seeds from positive users, or they can be self-reports (see section . ). credit: m eifler. approaches outlined in this paper. in fact, two of our protocols are designed for assisting public health organizations (and are designed with input from public health organizations). throughout, we refer to an at risk individual as one who has been in contact with an individual who has tested as positive for covid- (under criteria as defined by public health programs, e.g., "within feet for over minutes"). before we start this discussion, it is helpful to consider one principle which the proposed protocols respect: "if you do not report as being positive, then no information of yours will leave your phone." from a more technical standpoint, the statement that is consistent with our protocols is: if you do not report as being positive, then only random ("pseudonymized") signals are permitted to be broadcast from your phone. these random broadcasts are what allows proximity based tracing; see figure for a description of the mobile tracing protocol. it is worthwhile to note that this principle is consistent, in spirit, with conventional contract tracing approaches, where only positively tested individuals reveal information to the public health authorities. with the above principle, the discussion at hand largely focuses on what can be inferred when a positive disclosure occurs along with how a malicious party can impact the system. we focus the discussion on the "mobile tracing" protocol for the following reasons: "narrowcasting" allows people to listen for events in their region, so it can viewed as a one way messaging system. for "mobile-assisted interviews," all the data remains on the user's device, except for what they voluntarily reveal to public health authorities in order to enable contact tracing. all the claims are consequences of basic security properties that can formally be proved about the protocol, and in particular, about the cryptographic mechanism generating these random-looking signals. we start first with what private information is protected and what is shared voluntarily, following disclosure and consent. the inferential risk is due to that the alert itself is correlated with other information, from which a user could deduce de-anonymizing information. . if i tested positive and i voluntarily disclose this information, what does the protocol reveal to others? any other citizen who uses a mobile application following this protocol who has been at risk is notified. in some versions the time(s) that the exposure(s) occurred may be shared. in the basic mobile tracing system that we envision, beyond exposure to specific individuals, no information is revealed to any other citizens or entities (authorities, insurance companies, etc). it is also worthwhile noting that, if you are negative, then the protocol does not directly transmit any of your private information to any public database or any other third party; the protocol does transmit random ("pseudonymized") signals that your phone broadcasts. . re-identification and inferential risks. can a positive citizen's identity, who chooses to report being positive, be inferred by others? identification is possible and is a risk to volunteers who would prefer to remain de-identified. preventing proximity-based identification of this sort is not possible to avoid in any protocol, even in manual contact tracing as done by public health services, simply because the exposure alert may contain information that is correlated with identifying information. for example, an individual who had been in close proximity to only one person over the last two weeks can infer the identity of this positively tested individual. however, the positive's identity will never be explicitly broadcast. in fact, identities are not even stored in the dataset: it is only the positive person's random broadcasts that are stored. . mitigating re-identification. can the app be designed so as to mitigate re-identification risks to average users? while the protocol itself allows a sophisticated user, who is at risk, to learn the time at which the exposure occurred, the app itself can be designed to mitigate the risk. for example, in the app design, the reidentification risk could be mitigated by only informing the user that they are at risk, or the app could only provide the rough time of day at which the exposure occurred. this is a mild form of mitigation, which a malicious or sophisticated user could try to circumvent. we now directly address questions about the potential for malicious hackers, governments, or organizations to compromise the system. in some cases, cryptographically secure procedures can prevent certain attacks, and, in other cases, malicious disclosure of information is prevented because the protocol stores no data outside of your device by default. only cryptographically secure data from positively confirmed individuals is stored outside of devices. . integrity attacks. if you are negative, can a malicious citizen listen to your phone's broadcasts and, then report positive pretending to be you? no, this is not possible, provided you keep your initial seed private (see figure ). furthermore, even if the malicious party records all bluetooth signals going into and out of your phone, this is not possible. this attack is important to avoid, since, suppose a malicious entity observes all bluetooth signals sent from your phone. then, you would not want this entity to report you as positive. this attack is not possible as the seed uniquely identifies your broadcasts and remains unknown to the attacker, unless the attacker is able to successfully break the underlying cryptographic mechanism, which is unlikely to be possible. . inferential attacks. can a positive citizen's location, who chooses to report being positive, be inferred by others? it is possible for a malicious party to simultaneously record broadcasts at multiple different locations, including those that the positive citizen visited. using these recordings, the malicious party could infer where the positive citizen was. the times at which the citizen visited these locations can also be inferred. . replay and reliability attacks. if a citizen is alerted to be at risk, is it possible the citizen was not in the proximity of a positive individual? there are a few unlikely attacks that can trigger a false alert. one is a replay attack. for example, suppose a malicious group of multiple individuals colludes to try and pretend to be a single individual; precisely, suppose they all use the same private seed (see figure ). then if only one of these malicious individuals makes a positive report, then multiple people can be alerted, even if those people were not in the proximity of the person who made the positive report. the protocol incorporates several measures to make such attacks as difficult as possible. . physical attacks. what information is leaked if a citizen's device is compromised by a hacker, stolen, or physically seized by an authority? generally, existing mechanisms protect access to the storage of a phone. should these mechanisms fail, the device only stores enough information to reconstruct the signals broadcast over a period of time prior to the compromise which amounts to the length of the infection window (i.e., two weeks), in addition to collected signals. this enables some additional inference attacks. it is not possible to learn whether the user has ever reported positive. given that we would like the protocol to be of use to different states and countries, we seek an approach which allows for both security in reporting and for flexibility from the app designer in regions where it may make sense to consider reports which are self-confirmed positives tests or self-confirmed symptoms. reporting. does the protocol support both medically confirmed positive tests and self-confirmed positives tests? yes, it supports both. the uploaded files contain signatures from the uploading party (i.e. from a hospital lab or from any app following the protocol). this permits an app designer the freedom to use information from health systems and information from individuals in possibly different manners. in less developed nations, it may be helpful to permit the app designer to allow for reports based on less reliable signatures. reliability. how will the protocol handle issues of false positives and false negatives, with regards to alerting? what about cases when users don't have (or use their) mobile phones? the protocol does not explicitly address this, but a deployment requires both thoughtful app design and responsbile communication with the public. with regards to the former, the false positive and false negative rates have to be taken into account when determining how to make at risk reports. more generally, estimates of the probabilities can be helpful to a user (or an otherwise interpretable report); such reports can be particularly relevant for those in high risk categories (such as the elderly and immuno-compromised individuals). furthermore, not everyone has a smartphone, and not everyone with a smartphone will use this app. thus, users of this app -if they have not received any notification of exposure with covid- positive cases -should not assume that they have not been around such positive cases. this means, for example, that they should still be cautious and follow all appropriate current public health guidelines, even if the app has not alerted them to possible covid- exposure. this is particularly important until there is sufficient penetration of the app in any local population. we now list threats that are outside of the scope of the protocol, yet important to consider. care should be taken to address these concerns: • trusted communication. communication between users and servers must be protected using standard mechanisms (i.e., the tls protocol [ ] ). • spurious entries. self-reporting allows a malicious user to report themselves positive when they are not, and generally may allow several fake reports (i.e. a flooding attack). mitigation techniques should be introduced to reduce the risk of such attacks. • invalid authentication. positive reports should be validated using digital signatures, e.g., by healthcare providers. this requires appropriate public-key infrastructure to be in place. additional vulnerabilities related to misuse or misconfiguration of this infrastructure can affect reliability of positive reports. • implementation issues. implementation aspects may weaken some of our claims, and need to be addressed. for example, signals we send over bluetooth as part of our protocol may be correlated with other signals which de-anonymize the user. we now provide an overview of the three functionalities of pact. this section describes and discusses a privacy-sensitive mobile tracing protocol. our protocol follows a pattern wherein users exchange ids via bluetooth communication. if a user is both infected (we refer to such users as positive, and otherwise as negative) and willing to warn others who may have been at risk via proximity to the user, then de-identified information is uploaded to a server to warn other users of potential exposure. the approach has been followed by a number of similar protocols -we describe the differences with some of them in section . in appendix b, we discuss an alternative approach which may offer some efficiency and privacy advantages, at the cost of relying on signatures as opposed to hash functions. low-level technical details are omitted, e.g., how values are broadcast. further, it is assumed the communication between users and the server is protected using the transport layer security (tls) protocol. we first describe a variant of the protocol without entry validation, and discuss how to easily extend it to validate entries below. • parameters. we fix an understood time unit dt and define ∆ such that ∆ · dt equals the infection window. (typically, this would be two weeks.) we also fix the bit length n of the identifiers. (typically, n = .) we also use a function g : { , } n → { , } n which is assumed to be a secure cryptographic pseudorandom generator (prg). if n = , we can use g(x) = sha- (x). • pseudorandom id generation. every user broadcasts a sequence of ids id , to generate these ids, the user initially samples a random n-bit seed s , and then computes for i = , , . . .. after i time units, the user only stores s * ← s max{i−∆, } , the time t * at which s * was generated, the current s i , and the time t i at which s i was generated. note that if the device was powered off or the application disabled, we need to advance to the appropriate s i . • pseudorandom id collection. for every id broadcast by a device in its proximity at time t, a user stores a pair (id, t) in its local storage s. • reporting. to report a positive test, the user uploads (s * , t start = t * , t end = t i ) to the server, which appends it to a public list l. the server checks that t start and t end are reasonable before accepting the entry. once reported, the user erases its memory and restarts the pseudorandom id generation procedure. • checking exposure. a user downloads l from the server (or the latest portion of it). for every entry (s * , t start , t end ) in l, it generates the sequence of ids id * , . . . , id * ∆ starting from s * , as well as estimates t * i of the time at which each id * i was initially broadcast. if s contains (id * i , t) for some i ∈ { , . . . , ∆} such that t and t * i are sufficiently close, the user is alerted of potential exposure. setting delays. to prevent replay attacks, an entry (s * , t start , t end ) should be published with a slight delay. this is to prevent an id * ∆ generated from s * being recognized as a potential exposure by any user if immediately rebroadcast by a malicious party. entry validation. entries can (and should) be validated by attaching a signature σ on (s * , t start , t end ) when reporting, as well as (optionally) a certificate to validate this signature. an entry thus has form (s * , t start , t end , σ, cert). entries can be validated by multiple entities, by simply re-uploading them with a new signature. a range of designs and policies are supported by this approach. upon an initial update, a (weakly secure) signature with an app-specific key could be attached for self-reporting. this signature does not provide any real security (as we cannot guarantee that an app-specific signing key remains secret), but can be helpful to offer improved functionality. third-parties (like health-care providers) can re-upload an entry with their signature after validation. an app can adopt different policies on how to display a potential exposure depending on how it is validated. we also do not specify here the infrastructure required to establish the validity of certificates, or how a user interacts with a validating party, as this is outside the scope of this description. fixed-length sequences of ids. as stated, during the first ∆ − time units a user will have generated a sequence of fewer than ∆ ids. during this time, the number of ids the user has generated from its current s * is determined by how long ago the user started the current pseudorandom id generation procedure (either when they initially started using the protocol or when they last submitted a report). this may be undesirable information to reveal to a party that gains access to the sequence of ids (e.g. if the user submits a report or if the party gains physical access to the user's device). so to avoid revealing this information, a user may optionally iterate to s ∆ and use id ∆ as the first id they broadcast when starting or restarting the pseudorandom id generation procedure. synchronized updates. suppose a user updates their seed every dt amount of time after whenever they happened to originally start the id generation process. then it may be possible to correlate two ids of a user by noticing that the times at which the ids were initially broadcast were separated in time by a multiple of dt. to mitigate this it would be beneficial to have an agreed schedule of when all users update their seed. for example, if dt is minutes then it might be agreed that everyone should update their seed at midnight utc, followed by : , : , and so forth. privacy and integrity properties of the protocol follow from the following two propositions. (their proofs are omitted and follow from standard techniques.) in the following discussion, it is convenient to refer to an id value id i output by a user as unreported if it is not within the ∆ id's generated by a seed the user has reported to the server. proposition (pseudorandomness) all unreported ids are pseudorandom, i.e., no observer (different than the user) can distinguish them from random looking strings (independent from the state of the user) without compromising the security of g. proposition (one-wayness) no attacker can produce a seed s which generates a sequence of ∆ ids that include an unreported id generated by an honest user (not controlled by the adversary) without compromising the security of g. to discuss the consequences of these properties on privacy and integrity, let us refer to users as either "positive" or "negative" depending on whether they decided to report as positive, by uploading their seed to the server, or not. • privacy for negative users. by the pseudorandomness property, a negative user u only broadcasts pseudorandom ids. these ids cannot be linked without knowledge of the internal state of u. this privacy guarantee improves with the frequency of updating the seed s i -ideally, if a different id i is broadcast each time, no linking is possible. this however results in less efficient checking for exposure by negative users. • privacy for positive users. upon reporting positive, the last ∆ ids generated by the positive user can be linked. ( we discuss what this means below, and possible mitigation approaches.) however, by pseudorandomness, this is only true for the ids generated within the infection window. older ids and newer ids cannot be linked with those in the infection window, and with each other. therefore, a positive user has the same guarantees as a negative user outside of the reported infection window. • integrity guarantees. it is infeasible for an attacker to upload to the server a value s * which generates an unreported id that equals one generated by another user. this prevents the attacker from misreporting ids of otherwise negative users and erroneously alerting their contacts. timing information and replay attacks. the timestamping is necessary to prevent replay attacks. in particular, we are concerned by adversaries rebroadcasting ids of legitimate users (to be tested positive) outside the range of their devices. this may create a high number of false exposures to be reported. an attack we cannot prevent is the following relay attack: an attacker captures an id of an honest user at location a, sends it over the internet to location b, where it is re-broadcast. however, as soon as there is sufficient delay, the attack is prevented by maintaining sufficiently accurate timing information. (one can envision several accuracy compromises in the implementation, which we do not discuss here.) strong integrity. our integrity property does not prevent a malicious user from reporting a seed s * generating an id which has been already reported. given an entry with seed s * , the attacker just chooses (for example) s * as the first half of g(s * ). the threat of such attacks does not appear significant. however, they could be prevented with a less lightweight protocol, as we explain next. we refer to the resulting security guarantee as strong integrity. each user generates a signing/verification-key pair (sk, vk) along with the initial seed. then, we include vk in the id generation process, in particular let (s i , id i ) ← g(s i− , vk). an entry now consists of (s * , t start , t end , vk, σ), where σ is a signature (with signing key sk) on (s * , t start , t end , vk). entries with invalid signatures are ignored. (this imposes slightly stronger assumptions on g -pseudorandomness under related seeds sharing part of the input and binding of vk to s i .) the cen protocol, discussed in section , is the only one that targets strong integrity, though their initial implementation failed to fully achieve it. (the issue has been fixed after our report.) one explicit compromise we take is that ids of a positive user can be linked within the infection window, and that the start and end time of the infection window is known. for example, an adversary collecting ids at several locations can detect that the same positive user has visited several locations at which it collects broadcast identifiers. this can be abused for surveillance purposes, but arguably, surveillance itself could be achieved by other methods. the most problematic aspect is the linking of this individual with the fact that they are positive. a natural approach to avoid linking, as in [ ] , is for the the server to only expose the ids, rather than a seed from which they are computed. however, this does not make them unlinkable. imagine, at an extreme, that the storage on the server is append only (which is a realistic assumption). then, the ids belonging to the same user are stored sequentially. one can obfuscate this leakage of information in several ways, for example by having the server buffer a certain amount of new ids, and shuffle them before release. nonetheless, the actual privacy improvement is hard to assess without a good statistical model of upload frequency. this also increases the latency of the system which directly harms its public health value. a user could also learn at which time the exposure took place, and hence infer the identity of the positive user from other available information. we stress that the application can and should refuse to display the time of potential exposure -thus preventing a "casual attacker" from learning timing information. however, a malicious app can always remember at which time an id has been seen. contact tracing interviews are laborious and often miss important events due to the limitations of human memory. our plan to assist here is to provide information to the end user that can (with consent) be shared with a public health organization charged with performing contact tracing interviews. this is not an exposure of the entire observational log, but rather an extract of the information which is requested in a standard contact tracing interview. we have been working with healthcare teams from boston and the university of washington on formats and content of information that are traditionally sought by public health agencies. ideally, such extraction can be done working with the user before a contact tracing interview even occurs to speed the process. healthcare authorities from nyc have informed us that they would love to have the ability to make public service announcements which are highly tailored to a location or to a subset of people who may have been in a certain region during specific periods of time. this capability can be enabled with a public server supporting (area x time,message) pairs. here "area" is a location, a radius (minimum meters), a beginning time and an ending time. only announcements from recognized public health authorities are allowed. anyone can manually query the public server to determine if there are messages potentially relevant to them per their locations and dwells at the locations over a period of time. however, simple automation can be extremely helpful as phones can listen in and alert based on filters that are dynamically set up based on privately-held locations and activities. upon downloading (area x time, message) pairs a phone app (for example) can automatically check whether the message is relevant to the user. if it is relevant, a message is relayed to the device owner. querying the public server provides no information to the server through the protocol itself, because only a simple copy is required. we discuss some alternative approaches to mobile tracing. some of these are expected to be adopted in existing and future contact-tracing proposals, and we discuss them here. hart et al. [ ] provides a useful high-level understanding of the issues involved in contact tracing. they discuss, among other topics, the value of using digital technology to scale contract tracing and the trade-offs between different classes of solutions. pact users upload their locally generated ids upon a positive report. an alternative is to upload collected ids of potentially at risk users. this approach (which we refer to as the dual approach) has at least one clear security disadvantage and one mild privacy advantage over pact. (the latter is only true if the system is carefully implemented, as we explain below.) disadvantages: reliability and integrity attacks. in the dual approach, a malicious user cannot be prevented from very easily reporting a very large number of ids which were not generated by users in physical proximity. these ids could have been collected by colluding parties elsewhere, at any time before the report. such attacks can seriously hurt the reliability of the system. in pact, to achieve a similar effect, the attacker needs to (almost) simultaneously broadcast the same id in direct proximity of all individuals who should be falsely alerted to be potentially at risk. pact ensures integrity of positive reporting by exhibiting a seed generating these ids, known only to the reporter. a user u cannot frame another negative user u as a positive user by including an id generated by u . in the dual approach, user u could be framed for example by uploading ids that have been broadcast in their surroundings. advantage: improved temporal ambiguity. both in the dual approach and in pact-like designs, a user at risk can de-anonymize a positive user from the time at which the matching id was generated/collected, and other contextual information (e.g., a surveillance video). the dual approach offers a mitigation to this using re-randomization of ids. we explain one approach [ ] . let g be a prime-order cyclic group with generator g (instantiated via a suitable elliptic curve). . each user u chooses a secret key s u as a random element in z p . . each broadcast id takes the form id i = (g ri , g risu ), where r , r , . . . are random elements of z p . . to upload an id with form id = (x, y) with a report, a positive user uploads instead a re-randomized version id = (x r , y r ), where r is a fresh random value from z p . . to determine whether they are at risk, user u checks whether an id of the form id = (x, y) such that y = x su is stored on the server. under a standard cryptographic assumption -the so-called decisional diffie-hellman (ddh) assumptionthe ids are pseudorandom. further, a negative user who learns they are at risk cannot tell which one of the ids they broadcast has been reported, as long as the reporting user re-randomized them and all ids have been generated using the same s u . note that incorrect randomization only hurts the positive user. crucially, however, the privacy benefit inherently relies on each user u re-using the same s u , and we cannot force a malicious user to comply. for example, to track movements of positive users, a surveillance entity can generate ids at different locations with form (x, y) where y = x s l and s l depends on the location l. identifiers on the server with form (x, x s l ) can then be traced back to location l. a functionally equivalent attack is in fact more expensive against pact, as this would require storing all ids of users broadcast at location l. we discuss an alternative centralized approach here, which relies on a trusted third party (ttp), typically an agency of a government. such a solution requires an initial registration phase with the ttp, where each user subscribes to the service. moreover, the protocol operates as follows: . users broadcast random-looking ids and gather ids collected in their proximity. . upon a positive test, a user reports to the ttp all of the ids collected in their proximity during the relevant infection window. the ttp then alerts the users who generated these ids, who are now at risk. in order for the ttp to alert potentially at risk users, it needs to be able to identify the owners of these identifiers. there a few technical solutions to this problem. • one option is to have the ttp generate all ids which are used by the users -this requires either storing them or (in case only seeds generating them are stored) a very expensive check to identity at risk users. • a more efficient alternative for the ttp (but with larger identifiers) goes as follows. the trusted thirdparty generates a public-key/secret-key pair (sk, pk), making pk public. it also gives a unique token τ u to each user u upon registration, which it remembers. then, the i-th id of user u is id i = enc(pk, τ u ). (note that encryption is randomized here, so every id i appears independent from prior ones.) the ttp can then efficiently identify the user who generated id i by decrypting it. privacy considerations. such a centralized solution offers better privacy against attackers who do not collude with the ttp -in particular, only pseudorandom identifiers are broadcast all times. moreover, at risk individuals only learn that one of the ids they collected belongs to a positive individual. a -risk users can still collude, learning some information from the time of being reported at risk, and correlate identifiers belonging to the same positive user, but this is harder. the biggest drawback of this solution, however, is the high degree of trust on the ttp. for example: • the ttp learns the identities of all at risk users who have been in proximity of the positive subject. • the ttp can, at any time and independently of any actual report, learn the identity of the user u who broadcasts a particular id, or at least link them to their token τ u . this could be easily exploited for surveillance of users adopting the service. security consideration. as in the dual approaches described above, it is trivial for a malicious party identifying as honest to report valid identifiers of other users (which may have been collected in a distributed fashion) to erroneously alert them as being at risk. replay attacks can be mitigated by encrypting extra meta-data along with τ u (e.g., a timestamp), but this would make ids even longer. if the ttp is malicious it can target specific users to falsely claim they are at risk or to refrain from informing them when they actually are at risk. it is also possible to design protocols based on the sensing of absolute locations (gps, and gps extended with dead reckoning, wifi, other signals per current localization methods) consistent with "if you do not report as being positive, then no information of yours will leave your phone" (see section ). for example, a system could upload location traces of positives (cryptographically, in a secure manner), and then negative users, whose traces are stored on their phones could intersect their traces with the positive traces to check for exposure. this could potentially be done with stronger cryptographic methods to limit the exposure of information about these traces to negative users; one could think of this as a more general version of private-set intersection (psi) [ , , ] . however, such solutions would still reveal traces of positives to a server. there are two reasons why we do not focus on the details of such an approach here: • current localization technologies are not as accurate as the use of bluetooth-based proximity detection, and may not be accurate enough to be consistent with medically suggested definitions for exposure. • approaches employing the sensing and collection of absolute location information would need to rely more heavily on cryptographic protocols to keep the positive users traces secure. however, this is an approach worth keeping in mind as an alternative, per assessments of achievable accuracies and relevance of the latter accuracies for public health applications. there are an increasing number of contact tracing applications being created with different protocols. we will briefly discuss a few of these and how their mobile tracing protocols compare with the approaches described in section . and . the privacy-sensitive mobile tracing protocols proposed by coepi [ ] , covidwatch [ ], as well as dp t [ ] , have a similar structure to our proposed protocol. we briefly describe the technical differences between all of these protocols and discuss the implications of these differences. similar to our proposed protocol, these are based on producing pseudorandom ids by iteratively applying a prg g to a seed. coepi and covidwatch use the contact event numbers (cen) protocol, in which the initial seed is derived from a digital signature signing key rak and g is constructed from two hash functions (which during each iteration incorporate an encoding of the number of iterations done so far and the verification key rvk which matches rak). another proposal is the dp t [ ] protocol, in which g is constructed from a hash function, a prf, and another prg. the latter prg is used so that a single iteration of g produces all the ids needed for a day. these ids are used in a random order throughout the day. both of these (under appropriate cryptographic assumptions) achieve the same sort of pseudorandomness and one-wayness properties as our protocol. the incorporation of rvk into g with cen is intended to provide strong integrity and allow a reporting user to include a memo with their report that is cryptographically bound to the report. two ideas for what such a memo might include are a summary of the user's self-reported symptoms (coepi) or an attestation from a third party verifying that the user tested positive (covidwatch). because a counter of how many times the seed has been updated is incorporated into g, a report must specify the corresponding counters. this leaks how long ago the user generated the initial seed, which could potentially be correlated with identifying information about the user (e.g., when they initially downloaded the app). an earlier version of cen incorrectly bound the digital signature key to the identifiers in a report. suppose an honest user has submitted a report for id j through id j (for j < j ) with a user chosen memo. given this report, an attacker could create their own report that verifies as valid, but includes the honest user's id i for some i between j and j together with a memo of the attacker's choosing. a fix was proposed after we contacted the team behind the cen protocol. the random order of a user's ids for a day by dp t is intended to make it difficult for an at risk individual to identify specifically when they were at risk (and thus potentially, by whom they were exposed). a protocol cannot hope to hide this sort of timing information from an attacker that chooses to record the time when they received every id they see; this serves instead as a mitigation against a casual attacker using an app that does not store this sort of timing information. in our protocol and cen, information about the exposure time is not intended to be as hidden at the protocol. in our protocol the time an id was used is even included as part of a report and used to prevent replay attacks, as discussed earlier. cen does not use timing information to prevent replay attacks, but considers that an app may choose to give users precise information about where they were exposed (so the user can reason about how likely this potential exposure was to be an actual exposure). a similar protocol idea was presented in [ ] . it differs from the aforementioned proposals in that individual ids are uploaded to the server, rather than a seed generating them (leading to increased bandwidth and storage). alternatives using bloom filters to reduce storage are discussed, but these inherently decrease the reliability of the system. dp t also recently included a similar protocol as an additional option, using cuckoo filters in place of bloom filters. the tracetogether [ ] app is currently deployed in singapore. it uses the bluetrace protocol designed by at team at the government technology agency of singapore. this protocol is closely related to the encryption-based technique discussed in section . . the private kit: safe paths app [ , ] intends to use an absolute-location-centric approach to mobile tracing. they intend to mitigate some of the downsides discussed in section . by reported location traces of positive users to be partially redacted. it is unclear what methodology they intend to use for deciding how to redact traces. the trade-off in this redaction process between how easily a positive user can be identified from their trace and how much information must be removed from it (decreasing its usefulness). they intend to use cryptographic protocols (likely based on [ ] ) to minimize the amount of information revealed about positive users' traces. a group of scientist at the big data institute of oxford university have proposed the use of a mobile contact-tracing app [ , ] based on their analysis in [ ] . the nexttrace [ ] project aims to coordinate with covid- testing labs and users, providing software to enable contact tracing. the details of these proposals and the privacy protections they intend to provide are not publicly available. the projects we refer to are only a small selection of the mobile contract-tracing efforts currently underway. a more extensive listing of these projects is being maintained at [ ] , along with other information of interest to contract tracing. discussion and further considerations most protocols like ours store a seed on a server, which is then used to deterministically generate a sequence of identifiers. details differ in how exactly these sequences are generated (including the adopted cryptographic algorithms). however, it appears relatively straightforward for apps to be modified to support all of these different sequence formats. a potential challenge is data from different protocol may provide different levels of protection (e.g., the lack of timing information may reduce the effectiveness against replay attacks). this difference in reliability may be surfaced via the user-interface. in order to support multiple apps accessing servers for different services, it is important to adopt an interoperable format for entries to be stored on a server and possibly, to develop a common api. we acknowledge that ethical questions arise with contact tracing and in the development and adoption of any new technology. the question of how to balance what is revealed for the good of public health vs individual freedoms is one that is central to public health law. we iterate that privacy is already impacted by tracing practices. in some nations, positively tested citizens are required, either by public health policy or by law, to disclose aspects of their history. such actions and laws frame multiple concerns about privacy and freedom, and bring up important questions. the purpose of this document is lay out some of the technological capabilities, which supports broader discussion and debate about civil liberties and the risks that contact tracing can pose to civil liberties. another concern is accessibility to the service: not everyone has a phone (or will have the service installed). one consequence of this is that the quality of contract tracing in a certain population inherently depends on factors orthogonal to the technological aspects, which in turn raises important questions about fairness. tracing is one part of a conventional epidemic response strategy, based on tests, tracing, and timeouts (ttt). programs involving all three components are as follows: • test heavily for the virus. south korea ran over tests per person found with the virus. • trace the recent physical contacts for anyone who tests positive. south korea conducted mobile contact tracing using telecom information. • timeout the virus by quarantining contacts until their immune system purges the virus, rendering them non-infectious. the mobile tracing approach allows this strategy to be applied at a dramatically larger scale than only relying on human contact tracers. this chain is only as strong as its weakest link. widespread testing is required and wide-scale adoption must occur. furthermore, strategies must also be employed so that citizens takes steps to self-quarantine or seek testing (as indicated) when they are exposed. we cannot assume percent usage of the application and concomitant enlistment in ttt programs. studies are needed of the efficacy of the sensitivity of the effectiveness of the approach to different levels of subscription in a population. and nsf ( , ) . stefano tessaro acknowledges support from a sloan research fellowship and from the nsf under grants cns- , cns- . • bluetooth message: a bluetooth message consists of a fixed-length string of bytes. it is used with the bluetooth sensory log to discover if there is a match, which results in a warning that the user may have been in contact with an infected person. • message: a message is a cryptographically signed string of bytes which is interpreted by the phone app. this is used for either a public health message (announced to the user if the sensory log matches) or a bluetooth message. with the above defined, there are two common queries that the server supports as well as an announcement mechanism. • getmessages(region, time) returns all of the (area, message) pairs that the server has added since time for the region. the app can then check locally whether the area intersects with the recorded sensory log of (location,time) pairs on the phone, and alert the user with the message if so. • howbig(region, time) returns the (approximate) number of bytes worth of messages that would be downloaded on a getmessages call with the same arguments. howbig allows the phone app to control how much information it reveals to the server about locations/times of interest according to a bandwidth/privacy tradeoff. for example, the phone could start with a very coarse region, specifying higher precision regions until the bandwidth required is acceptable, then invoke getmessages. (this functionality is designed to support controlled anonymity across widely varying population densities.) • announce(area,message) uploads an (area, message) pair for general distribution. to prevent spamming, the signature of the message is checked against a whitelist defined with the server. we propose an alternative to the protocol in section . . one main difference is that the server cannot generate the ids broadcast by a positive user, and only stores a short verification key used to identify ids broadcast by the positive user. while this does not prevent many of the inference scenarios we discussed above, this appears to be a desirable property. as we explain below, this protocol offers a different cost for checking exposure, which may be advantageous in some deployment scenarios. this alternative approach inherently introduces risks of replay attacks which cannot be prevented by storing timestamps, because the server obtains no information about the times at which ids have been broadcast. to overcome this, we build on top of a very recent approach of pietrzak [ ] for replay-attack protection. (along similar lines, this can also be extended to relay-attack protection by including gps coordinates, but we do not describe this variant here.) • setup and parameters. we fix an understood time unit dt. we make use of a digital signature scheme specifying algorithms for key generation, signing, and verification, denoted kg, sign, and vrfy, respectively. we also use a hash they also determine the current time t i = t d + dt · (i − ). finally, the user samples n-bit random strings r i and r i and computes the identifier as where σ i = sign(sk d , r i ||h i ) and h i = h(r i , t i ). they broadcast (id i , r i , t i ). when day d ends the user deletes their signing key sk d . (the verification key vk d is not deleted, until an amount of time equal to the infection window has elapsed.) • pseudorandom id collection. for every id i = ((σ i , r i , h i ), r i , t i ) broadcast by a device in their proximity, a user first checks if t i is sufficiently close to their current time and if h i = h(r i , t i ). if so, they store id i in their local storage s. • reporting. to report a positive test, the user uploads each of their recent vk d to the server, which appends them to a public list l. once reported, the user erases their memory and restarts the pseudorandom id generation procedure. • checking exposure. a user downloads l from the server (or the latest portion of it). for every entry vk in l and every entry (σ, r, h) in s, they run vrfy(vk, σ, r||h). if this returns true, the user is alerted of potential exposure. efficiency comparisons. let ∆ be the number of ids broadcast over the infection window. let s = |s| be the size of the local storage. let l be the number of new verification keys a user downloads. to check exposure, the protocol from section . roughly runs in time where t g is the time needed to evaluate g. in contrast, for the protocol in this section, the time is where t vrfy is the time to verify a signature. one should note that t vrfy is generally larger than t g , but can still be quite fast. (for example, ed enables fast batch signature verification.) therefore, the usage of this scheme makes particular sense if a user does not collect many ids, i.e., s is small relative to ∆ · log(s). assumptions. we require the following two standard properties for the hash function h: • pseudorandomess: for any x and a randomly chosen r ∈ { , } n , the output h(r, x) looks random to anyone that doesn't know r. • collision resistance: it is hard to find distinct inputs to h that produce the same output. of our digital signature scheme we require the following three properties. the first is a standard property of digital signature schemes. the latter two are not commonly required of a digital signature scheme, so one needs to be careful when choosing a signature scheme to implement this protocol. we have verified that these properties are achieved by ed under reasonable cryptographic assumptions. • unforgeability: given vk and examples of σ = sign(sk, m) for attacker-chosen m, an attack cannot produce a new (σ , m ) for which vrfy(vk, σ , m ) returns true. • one-wayness: given examples of σ = sign(sk, m) for attacker-chosen m (but not given vk), an attacker cannot find vk for which vrfy(vk , σ, m) returns true for any of the example (σ, m). • pseudorandomess: the output of sign(sk, ·) looks random to an attacker that does not know vk or sk. privacy and security properties. we discuss the privacy and integrity properties this protocol has in common with the earlier protocol, as well as some newer properties not achieved by the earlier protocol. • privacy for negative users. by the pseudorandomness property, the signatures broadcast by a user u look pseudorandom. beyond that, u broadcasts two random strings and their view of the current time t i which is already known by any device hearing the broadcast. thus these broadcasts cannot be linked without knowledge of the internal state of u. as before, this privacy guarantee improves with the frequency of generating new ids. • privacy for positive users. upon reporting positive, the ids broadcast by a user within a single day can be linked to each other. ids broadcast on different days can be linked if the server does not hide which vk's were reported together. older ids from days before the infection window and newer ids from after the report cannot be linked with those in the infection window or with each other. therefore, a positive user has the same guarantees as a negative user outside of the reported infection window. • integrity guarantees. it is infeasible for an attacker to upload to the server a value vk which verifies an unreported id that was broadcast by another user. this prevents the attacker from misreporting ids of otherwise negative users and erroneously alerting their contacts. • replay protection. the incorporation of t in each id prevents an attacker from performing a replay attack where they gather ids of legitimate users (to be tested positive) and re-broadcast the ids at a later time to cause false beliefs of exposure. a vk reported to the server cannot be used to broadcast further ids that will be recognized by other users as matching that report. • non-sensitive storage. because h(r i , t i ) looks random, the information intentionally stored by the app together with an id does not reveal when the corresponding interaction occurred. (of course, it may be possible to infer information about t i through close examination of how the id was stored, e.g., where it was written in memory as compared to other ids.) information sharing across private databases assessing disease exposure risk with location histories and protecting privacy: a cryptographic approach in response to a global pandemic high-speed high-security signatures anonymous collocation discovery:taming the coronavirus while preserving privacy coepi: community epidemiology in action quantifying sars-cov- transmission suggests epidemic control with digital contact tracing efficient private matching and set intersection outpacing the virus: digital response to containing the spread of covid- while mitigating privacy risks rfc : edwards-curve digital signature algorithm (eddsa) delayed authentication: replay and relay attacks on dp- t phasing: private set intersection using permutationbased hashing apps gone rogue: maintaining personal privacy in an epidemic rfc : the transport layer security (tls) protocol version . . internet engineering task force (ietf) private kit: safe paths; privacy-by-design contact tracing decentralized privacy-preserving proximity tracing sustainable containment of covid- using smartphones in china: scientific and ethical underpinnings for implementation of similar approaches in other settings unified research on privacy-preserving contact tracing and exposure notification for covid- from web search to healthcare utilization: privacy-sensitive studies from mobile data we gratefully acknowledge dean foster for contributions that are central in designing the current protocol, along with contributions throughout the current document. the authors thank yael kalai for numerous helpful discussions, along with suggesting the protocol outlined in section . . we thank edward jezierski, nicolas di tada, vi hart, ivan evtimov, and nirvan tyagi for numerous helpful discussions. we also graciously thank m eifler for designing all the figures. sham kakade acknowledges funding from the washington research foundation for innovation in data-intensive discovery, the onr award n - - - , nsf grants #ccf- and #ccf . jacob sunshine acknowledges funding from nih (k da ) a number of practical issues and details may arise with implementation. . with regards to anonymity, if the protocol is implemented over the internet, then geoip lookups can be used to localize the query-maker to a varying extent. people who really care about this could potentially query through an anonymization service. . the narrowcast messages in particular may be best expressed through existing software map technology. for example, we could imagine a map querying the server on behalf of users and displaying public health messages on the map. . the bandwidth and compute usage of a phone querying the full database may be to high. to avoid this, it's reasonably easy to augment the protocol to allow users to query within a (still large) region.we mention one such approach below. . disjoint authorities. across the world, there may be many testing authorities which do not agree on a common infrastructure but which do wan to use the protocol. this can be accommodated by enabling the phone app to connect to multiple servers. . the mobile proximity tracing does not directly inform public authorities who may be a contact. however, it does provide some bulk information, simply due to the number of posted messages.there are several ways to implement the server. a simple approach, which works fine for not-to-many messages just uses a public github repository.a more complex approach supporting regional queries is defined next. anyone can ask for a set of messages relevant to some region r where r is defined by a latitude/longitude range with messages after some timestamp. more specific subscriptions can be constructed on the fly based on policies that consider a region r and privately observed periods of time that an individual has spent in a region. such scoped queries and messaging services that relay content based on location or on location and periods of time are a convenience to make computation and communication tractable. the reference implementation uses regions greater in size than typical geoip tables.to be specific, let's first define some concepts.• region: a region consists of a latitude prefix, a longitude prefix, and the precision in each. for example, new york which is at . n, - . e can be coarsened to n, - e with two digits of precision (the actual implementation would use bits).• time: a timestamp is specified in the number of seconds (as a bit integer) since the january , .• location: a location consists of a full precision latitude and longitude• area: an area consists of a location, a radius, a beginning time, and an ending time. key: cord- -aew xr n authors: garcía-durán, alberto; gonzález, roberto; oñoro-rubio, daniel; niepert, mathias; li, hui title: transrev: modeling reviews as translations from users to items date: - - journal: advances in information retrieval doi: . / - - - - _ sha: doc_id: cord_uid: aew xr n the text of a review expresses the sentiment a customer has towards a particular product. this is exploited in sentiment analysis where machine learning models are used to predict the review score from the text of the review. furthermore, the products costumers have purchased in the past are indicative of the products they will purchase in the future. this is what recommender systems exploit by learning models from purchase information to predict the items a customer might be interested in. the underlying structure of this problem setting is a bipartite graph, wherein customer nodes are connected to product nodes via ‘review’ links. this is reminiscent of knowledge bases, with ‘review’ links replacing relation types. we propose transrev, an approach to the product recommendation problem that integrates ideas from recommender systems, sentiment analysis, and multi-relational learning into a joint learning objective. transrev learns vector representations for users, items, and reviews. the embedding of a review is learned such that (a) it performs well as input feature of a regression model for sentiment prediction; and (b) it always translates the reviewer embedding to the embedding of the reviewed item. this is reminiscent of transe [ ], a popular embedding method for link prediction in knowledge bases. this allows transrev to approximate a review embedding at test time as the difference of the embedding of each item and the user embedding. the approximated review embedding is then used with the regression model to predict the review score for each item. transrev outperforms state of the art recommender systems on a large number of benchmark data sets. moreover, it is able to retrieve, for each user and item, the review text from the training set whose embedding is most similar to the approximated review embedding. online retail is a growing market with sales accounting for $ . billion or . % of total us retail sales in [ ] . in the same year, e-commerce sales accounted for . % of all retail sales growth [ ] . for some entertainment products such as movies, books, and music, online retailers have long outperformed traditional in-store retailers. one of the driving forces of this success is the ability of online retailers to collect purchase histories of customers, online shopping behavior, and reviews of products for a very large number of users. this data is driving several machine learning applications in online retail, of which personalized recommendation is the most important one. with recommender systems online retailers can provide personalized product recommendations and anticipate purchasing behavior. in addition, the availability of product reviews allows users to make more informed purchasing choices and companies to analyze costumer sentiment towards their products. the latter was coined sentiment analysis and is concerned with machine learning approaches that map written text to scores. nevertheless, even the best sentiment analysis methods cannot help in determining which new products a costumer might be interested in. the obvious reason is that costumer reviews are not available for products they have not purchased yet. in recent years the availability of large corpora of product reviews has driven text-based research in the recommender system community (e.g. [ , , ] ). some of these novel methods extend latent factor models to leverage review text by employing an explicit mapping from text to either user or item factors. at prediction time, these models predict product ratings based on some operation (typically the dot product) applied to the user and product representations. sentiment analysis, however, is usually applied to some representation (e.g. bagof-words) of review text but in a recommender system scenario the review is not available at prediction time. with this paper we propose transrev, a method that combines a personalized recommendation learning objective with a sentiment analysis objective into a joint learning objective. transrev learns vector representations for at training time, a function's parameters are learned to compute the review embedding from the word token embeddings such that the embedding of the user translated by the review embedding is similar to the product embedding. at the same time, a regression model g is trained to perform well on predicting ratings. users, items, and reviews jointly. the crucial advantage of transrev is that the review embedding is learned such that it corresponds to a translation that moves the embedding of the reviewing user to the embedding of the item the review is about. this allows transrev to approximate a review embedding at test time as the difference of the item and user embedding despite the absence of a review from the user for that item. the approximated review embedding is then used in the sentiment analysis model to predict the review score. moreover, the approximated review embedding can be used to retrieve reviews in the training set deemed most similar by a distance measure in the embedding space. these retrieved reviews could be used for several purposes. for instance, such reviews could be provided to users as a starting point for a review, lowering the barrier to writing reviews. we address the problem of learning prediction models for the product recommendation problem. a small example of the input data typical to such a machine learning system is depicted in fig. . this reminds of knowledge bases, with 'reviews' replacing relation types. two nodes in a knowledge base may be joined by a number of links, each representing one relation type from a small vocabulary. here, if two nodes are connected they are linked by one single edge type, in which case it is represented by a number of words from a (very) large vocabulary. there are a set of users u, a set of items i, and a set of reviews r. each rev (u,i) ∈ r represents a review written by user u for item i. hence, rev (u,i) = [t , · · · , t n ], that is, each review is a sequence of n tokens. in the following we refer to (u, rev (u,i) , i) as a triple. each such triple is associated with the review score r (u,i) given by the user u to item i. transrev embeds all users, items and reviews into a latent space where the embedding of a user plus the embedding of the review is learned to be close to the embedding of the reviewed item. it simultaneously learns a regression model to predict the rating given a review text. this is illustrated in fig. . at prediction time, reviews are not available, but the modeling assumption of transrev allows to predict the review embedding by taking the difference of the embedding of the item and user. then this approximation is used as input feature of the regression model to perform rating prediction-see fig. . approx. review embedding but good price" similar g( ) fig. . at test time, the review embedding is approximated as the difference between the product and user embeddings. the approximated review embedding is used to predict the rating and to retrieve similar reviews. transrev embeds all nodes and reviews into a latent space r k (k is a model hyperparameter). the review embeddings are computed by applying a learnable function f to the token sequence of the review the function f can be parameterized (typically with a neural network such as a recursive or convolutional neural network) but it can also be a simple parameter-free aggregation function that computes, for instance, the element-wise average or maximum of the token embeddings. we propose and evaluate a simple instance of f where the review embedding h rev (u,i) is the average of the embeddings of the tokens occurring in the review. more formally, where v t is the embedding associated with token t and h is a review bias which is common to all reviews and takes values in r k . the review bias is of importance since there are some reviews all of whose tokens are not in the training vocabulary. in these cases we have h rev (u,i) = h . the learning of the item, review, and user embeddings is determined by two learning objectives. the first objective guides the joint learning of the parameters of the regression model and the review embeddings such that the regression model performs well at review score prediction where s is the set of training triples and their associated ratings, and g is a learnable regression function r k → r that is applied to the representation of the review h rev (u,i) . while g can be an arbitrary complex function, the instance of g used in this work is as follows where w are the learnable weights of the linear regressor, σ is the sigmoid function σ(x) = + e −x , and b (u,i) is the shortcut we use to refer to the sum of the bias terms, namely the user, item and overall bias: b (u,i) = b u + b i + b . later we motivate the application of the sigmoid function to the review embedding. of course, in a real-world scenario a recommender system makes rating predictions on items that users have not rated yet and, consequently, reviews are not available for those items. the application of the regression model of eq. ( ) to new examples, therefore, is not possible at test time. our second learning procedure aims at overcoming this limitation by leveraging ideas from embeddingbased knowledge base completion methods. we want to be able to approximate a review embedding at test time such that this review embedding can be used in conjunction with the learned regression model. hence, in addition to the learning objective ( ), we introduce a second objective that forces the embedding of a review to be close to the difference between the item and user embeddings. this translation-based modeling assumption is followed in transe [ ] and several other knowledge base completion methods [ , ] . we include a second term in the objective that drives the distance between (a) the user embedding translated by the review embedding and (b) the embedding of the item to be small where e u and e i are the embeddings of the user and item, respectively. in the knowledge base embedding literature (cf. [ ] ) it is common the representations are learned via a margin-based loss, where the embeddings are updated if the score (the negative distance) of a positive triple (e.g. (berlin, located_in, germany)) is not larger than the score of a negative triple (e.g. (berlin, located_in, portugal)) plus a margin. note that this type of learning is required to avoid trivial solutions. the minimization problem of eq. ( ) can easily be solved by setting e u = h rev (u,i) = e i = ∀u, i. however, this kind of trivial solutions is avoided by jointly optimizing eqs. ( ) and ( ), since a degenerate solution like the aforementioned one would lead to a high error with respect to the regression objective (eq. ( )). the overall objective can now be written as min where λ is a term that weights the approximation loss due to the modeling assumption formalized in eq. ( ). in our model, Θ corresponds to the parameters w, e, v, h ∈ r k and the bias terms b. at test time, we can now approximate review embeddings of (u, i) pairs not seen during training by computinĝ with the trained regression model g we can make rating predictionsr (u,i) for unseen (u, i) pairs by computinĝ contrary to training, now the regression model g is applied toĥ revu,i , instead of h revu,i , which is not available at test time. the sigmoid function of the regression function g adds a non-linear interaction between the user and item representation. without such activation function, the model would consist of a linear combination of bias terms and the (ranking of) served recommendations would be identical to all users. all parameters of the parts of the objective are jointly learned with stochastic gradient descent. more details regarding the parameter learning are contained in the experimental section. the choice of transe as underlying modeling assumption to this recommendation problem is not arbitrary. given the user and item embeddings, and without further constraints, it allows to distinctively compute the approximate review embedding via eq. ( ). another popular knowledge graph embedding method is distmult [ ] . in applying such modeling assumption to this problem one would obtain the approximate review embedding by solving the following optimization problem:ĥ rev (u,i) = max h (e i • e u )h, where • is the element-wise multiplication. the solution to that problem would be any vector with infinite norm. therefore, one should impose constraints in the norm of the embeddings to obtain a non-trivial solution. however, previous work [ ] shows that such constraint harms performance. similarly, most of the knowledge graph embedding methods would require to impose constraints in the norm of the embeddings. the translation modeling assumption of transe facilitates the approximation of the review embedding without additional constraints, while its performance is on par with, if not better, than most of all other translation-based knowledge graph embedding methods [ ] . there are three lines of research related to our work: knowledge graph completion, recommender systems and sentiment analysis. the first research theme related to transrev is knowledge graph completion. in the last years, many embedding-based methods have been proposed to infer missing relations in knowledge graphs based on a function that computes a likelihood score based on the embeddings of entities and relation types. due to its simplicity and good performance, there is a large body of work on translation-based scoring functions [ , ] . [ ] propose an approach to large-scale sequential sales prediction that embeds items into a transition space where user embeddings are modeled as translation vectors operating on item sequences. the associated optimization problem is formulated as a sequential bayesian ranking problem [ ] . to the best of our knowledge, [ ] is the first work in leveraging ideas from knowledge graph completion methods for recommender system. whereas transrev addresses the problem of rating prediction by incorporating review text, [ ] addresses the different problem of sequential recommendation. therefore the experimental comparison to that work is not possible. in transrev the review embedding translates the user embedding to the product embedding. in [ ] , the user embedding translates a product embedding to the embedding of the next purchased product. moreover, transrev gets rid of the margin-based loss (and consequently of the negative sampling) due to the joint optimization of eqs. ( ) and ( ), whereas [ ] is formalized as a ranking problem in a similar way to [ ] . subsequently, there has been additional work on translation-based models in recommender systems [ , ] . however, these works cannot incorporate users' feedback other than ratings into the learning, which has been shown to boost performance [ ] . there is an extensive body of work on recommender systems [ , , ] . singular value decomposition (svd) [ ] computes the review score prediction as the dot product between the item embeddings and the user embeddings plus some learnable bias terms. due to its simplicity and performance on numerous data sets-including winning solution to the netflix prize-it is still one of the most used methods for product recommendations. most of the previous research that explored the utility of review text for rating prediction can be classified into two categories. semi-supervised approaches. hft [ ] was one of the first methods combining a supervised learning objective to predict ratings with an unsupervised learning objective (e.g. latent dirichlet allocation) for text content to regularize the parameters of the supervised model. the idea of combining two learning objectives has been explored in several additional approaches [ , , ] . the methods differ in the unsupervised objectives, some of which are tailored to a specific domain. for example, jmars [ ] outperforms hft on a movie recommendation data set but it is outperformed by hft on data sets similar to those used in our work [ ] . supervised approaches. methods that fall into this category such as [ , ] learn latent representations of users and items from the text content so as to perform well at rating prediction. the learning of the latent representations is done via a deep architecture. the approaches differences lie mainly in the neural architectures they employ. there is one crucial difference between the aforementioned methods and transrev. transrev predicts the review score based on an approximation of the review embedding computed at test time. moreover, since transrev is able to approximate a review embedding, we can use this embedding to retrieve reviews in the training set deemed most similar by a distance metric in the embedding space. similar to sentiment analysis methods, transrev trains a regression model that predicts the review rating from the review text. contrary to the typical setting in which sentiment analysis methods operate, however, review text is not available at prediction time in the recommender system setting. consequently, the application of sentiment analysis to recommender systems is not directly possible. in the simplest case, a sentiment analysis method is a linear regressor applied to a text embedding (eq. ( ) ). we conduct several experiments to empirically compare transrev to state of the art methods for product recommendation. moreover, we provide some qualitative results on retrieving training reviews most similar to the approximated reviews at test time. we evaluate the various methods on data sets from the amazon product data , which has been extensively used in previous works [ ] [ ] [ ] . the data set consists of reviews and product metadata from amazon from may to july . we focus on the -core versions (which contain at least reviews for each user and item) of those data sets. there are product categories from which we have randomly picked . as all previously mentioned works, we treat each of these resulting data sets independently in our experiments. ratings in all benchmark data sets are integer values between and . as in previous work, we randomly sample % of the reviews as training, % as validation, and % as test data. we remove reviews from the validation and test splits if they involve either a product or a user that is not part of the training data. we follow the same preprocessing steps for each data set. first, we lowercase the review texts and apply the regular expression "\w+" to tokenize the text data, discarding those words that appear in less than . % of the reviews of the data set under consideration. for all the amazon data sets, both full reviews and short summaries (rarely having more than words) are available. since classifying short documents into their sentiment is less challenging than doing the same for longer text [ ] , we have used the reviews summaries for our work. we truncate these reviews to the first words. for lack of space we cannot include statistics of the preprocessed data sets. we compare to the following methods: a svd matrix factorization; hft, which has not often been benchmarked in previous works; and deepconn [ ] , which learns user and item representations from reviews via convolutional neural networks. we also include mpcn [ ] (which stands for multi-pointer co-attention networks) in the comparison, however, as indicated in previous work [ ] mpcn is a non-reproducible work . therefore, we simply copy numbers from [ ] , since they used the same data sets as the ones used in this work. additionally, we also include performance for transnets (t-nets) [ ] , whose numbers are also copied from [ ] . t-nets is similar to transrev in that it also infers review latent representations from user and item representations. different to transrev, it does not have any underlying graph-based modeling assumption among users, items and reviews. we set the dimension k of the embedding space to for all methods. we evaluated the robustness of transrev to changes in sect. . . alternatively, one could use off-the-shelf word embeddings (e.g. word vec [ ] or elmo [ ] ), but this would require to assume the existence of a large collection of text for effectively learning good word representations in an unsupervised manner. however, such a corpus may not be available for some low-resource languages or domainspecific use cases. for transrev's parameters were randomly initialized [ ] and learned with vanilla stochastic gradient descent. a single learning iteration performs sgd with all review triples in the training data and their associated ratings. for transrev we used a batch size of . we ran transrev for a maximum of epochs and validated every epochs. for svd we used the python package surprise , and chose the learning rate and regularization term from the same range of values. parameters for hft were learned with l-bfgs, which was run for , learning iterations and validated every iterations. for deepconn the original authors' code is not available and we used a third-party implementation . we applied the default hyperparameters values for dropout and l regularization and used the same embedding dimension as for all other methods. all methods are validated according to the mean squared error (mse). the experimental results are listed in table where the best performance is in bold font. transrev achieves the best performance on all data sets with the exception of the kindle store and automotive categories. surprisingly, hft is more competitive than more recent approaches that also take advantage of review text. most of these recent approaches do not include hft in their baselines. transrev is competitive with and often outperforms hft on the benchmark data sets under consideration. to quantify that the rating predictions made by hft and transrev are significantly different we have computed the dependent t-test for paired samples and for all data sets where transrev outperforms hft. the p-value is always smaller than . . it is remarkable the low performance of deepconn, mpcn and t-nets in almost all datasets. this is in line with the findings reported in very recent work [ ] , where authors' analysis reveals that deep recommender models are systematically outperformed by simple heuristic recommender methods. these results only confirm the existing problem reported in [ ] . we randomly selected the data sets baby, digital music, office and tools&home improvement from the amazon data and evaluated different values of k for user, item and word embedding sizes. we increase k from to and always validate all hyperparameters, including the regularization term. table list the mse scores. we only observe small differences in the corresponding model's performances. this observation is in line with [ ] . for most of the data sets the validated weighting term λ takes the value of either . or . . this seems to indicate that the regression objective is more important than the modeling assumption in our task, as it directly relates to the goal of the task. the regularization term is of crucial importance to obtain good performance and largely varies across data sets, as their statistics also largely differ across data sets. review embeddings, which are learned from word embeddings, are learned to be good predictors of user ratings. as a consequence the learned word embeddings are correlated with the ratings. to visualize the correlation between words and ratings we proceed as follows. first, we assign a score to each word that is computed by taking the average rating of the reviews that contain the word. second, we compute a -dimensional representation of the words by applying t-sne [ ] to the -dimensional word embeddings learned by transrev. figure depicts these -dimensional word embedding vectors learned for the amazon beauty data set. the corresponding rating scores are indicated by the color. the clusters we discovered in fig. are interpretable. they are meaningful with respect to the score, observing that the upper right cluster is mostly made up of words with negative connotations (e.g. horrible, useless. . . ), the lower left one contains neutral words (e.g. with, products. . . ) and the lower right one contains words with positive connotations (e.g. awesome, excellent. . . ). one of the characteristics of transrev is its ability to approximate the review representation at prediction time. this approximation is used to make a rating prediction, but it can also be used to propose a tentative review on which the user can elaborate on. this is related to a number of approaches [ , , ] on explainable recommendations. we compute the euclidean distance between the approximated review embeddingĥ rev (u,i) and all review embeddings h rev (u,i) from the training set. we then retrieve the review text with the most similar review embedding. we investigate the quality of the tentative reviews that transrev retrieves for the beauty and digital music data sets. the example reviews listed in table show that while the overall sentiment is correct in most cases, we can also observe the following shortcomings: (a) the function f chosen in our work is invariant to word ordering and, therefore, cannot learn that bigrams such as "not good" have a negative meaning. (b) despite matching the overall sentiment, the actual and retrieved review can refer to different aspects of the product (for example, "it clumps" and "gives me headaches"). related work [ ] extracts aspects from reviews by applying a number of grammatical and morphological analysis tools. these aspects are used later on to explain why the model suspects that a user might be interested in a certain product. we think this type of explanation is complementary to ours, and might inspire future work. (c) reviews can be specific to a single product. a straightforward improvement could consist of retrieving only existing reviews for the specific product under consideration. table . reviews retrieved from the beauty (upper) and digital music (lower) data sets. in parenthesis the ratings associated to the reviews. closest training review in embedding space skin improved ( ) makes your face feel refreshed ( ) love it ( ) you'll notice the difference ( ) best soap ever ( ) i'll never change it ( ) it clumps ( ) gives me headaches ( ) smells like bug repellent ( ) pantene give it up ( ) fake fake fake do not buy ( ) seems to be harsh on my skin ( ) saved my skin ( ) not good quality ( ) another great release from saliva ( ) can't say enough good things about this cd ( ) a great collection ( ) definitive collection ( ) sound nice ( ) not his best nor his worst ( ) a complete massacre of an album ( ) some great songs but overall a disappointment ( ) the very worst best of ever ( ) overall a pretty big disappointment ( ) what a boring moment ( ) overrated but still alright ( ) great cd ( ) a brilliant van halen debut album ( ) we believe that more sophisticated sentence and paragraph representations might lead to better results in the review retrieval task. as discussed, a promising line of research has to do with learning representations for reviews that are aspect-specific (e.g. "ease of use" or "price"). transrev is a novel approach for product recommendation combining ideas from knowledge graph embedding methods, recommender systems and sentiment analysis. transrev achieves state of the art performance on the data sets under consideration while having fewer (hyper)parameters than more recent works. most importantly, one main characteristic of transrev is its ability to approximate the review representation during inference. this approximated representation can be used to retrieve reviews in the training set that are similar with respect to the overall sentiment towards the product. such reviews can be dispatched to users as a starting point for a review, and thus lowering the barrier to writing new reviews. given the known influence of product reviews in the purchasing choices of the users [ , ] , we think that recommender systems will benefit from such mechanism. user models: theory, method, and practice topicmf: simultaneously exploiting ratings and reviews for recommendation classifying sentiment in microblogs: is brevity an advantage? in: cikm translating embeddings for modeling multi-relational data empirical analysis of predictive algorithms for collaborative filtering transnets: learning to transform for recommendation are we really making much progress? a worrying analysis of recent neural recommendation approaches jointly modeling aspects, ratings and sentiments for movie recommendation (jmars) composing relationships with translations combining two and three-way embedding models for link prediction in knowledge bases understanding the difficulty of training deep feedforward neural networks traversing knowledge graphs in vector space translation-based recommendation knowledge base completion: baselines strike back matrix factorization techniques for recommender systems research and development in intelligent systems xxxii ratings meet reviews, a combined approach to recommend visualizing data using t-sne hidden factors and hidden topics: understanding rating dimensions with review text inferring networks of substitutable and complementary products image-based recommendations on styles and substitutes distributed representations of words and phrases and their compositionality an empirical comparison of knowledge graph embeddings for item recommendation deep contextualized word representations lit@eve : explainable recommendation based on wikipedia concept vectors factorizing personalized markov chains for next-basket recommendation fast maximum margin matrix factorization for collaborative prediction interpretable convolutional neural networks with dual local and global attention for review rating prediction representation learning of users and items for review rating prediction using attention-based convolutional neural network latent relational metric learning via memorybased attention for collaborative ranking multi-pointer co-attention networks for recommendation explaining reviews and ratings with paco: poisson additive co-clustering explicit factor models for explainable recommendation based on phrase-level sentiment analysis joint deep modeling of users and items using reviews for recommendation acknowledgements. the research leading to these results has received funding from the european union's horizon innovation action programme under grant agreement no -smooth project. this publication reflects only the author's views and the european community is not liable for any use that may be made of the information contained herein. key: cord- - pidolqb authors: maghdid, halgurd s.; ghafoor, kayhan zrar title: a smartphone enabled approach to manage covid- lockdown and economic crisis date: - - journal: sn comput doi: . /s - - - sha: doc_id: cord_uid: pidolqb the emergence of novel covid- causes an over-load in health system and high mortality rate. the key priority is to contain the epidemic and prevent the infection rate. in this context, many countries are now in some degree of lockdown to ensure extreme social distancing of entire population and hence slowing down the epidemic spread. furthermore, authorities use case quarantine strategy and manual second/third contact-tracing to contain the covid- disease. however, manual contact-tracing is time-consuming and labor-intensive task which tremendously over-load public health systems. in this paper, we developed a smartphone-based approach to automatically and widely trace the contacts for confirmed covid- cases. particularly, contact-tracing approach creates a list of individuals in the vicinity and notifying contacts or officials of confirmed covid- cases. this approach is not only providing awareness to individuals they are in the proximity to the infected area, but also tracks the incidental contacts that the covid- carrier might not recall. thereafter, we developed a dashboard to provide a plan for policymakers on how lockdown/mass quarantine can be safely lifted, and hence tackling the economic crisis. the dashboard used to predict the level of lockdown area based on collected positions and distance measurements of the registered users in the vicinity. the prediction model uses k-means algorithm as an unsupervised machine learning technique for lockdown management. in an unprecedented move, china locks down the megacity named wuhan, in which the novel coronavirus was first reported, in the hopes stopping the spread of deadly coronavirus. during the lockdown, all railway, port, and road transportation were suspended in wuhan city. with the increasing number of infections and fast person-to-person spreading, hospitals are overwhelmed with patients. later, the disease has been identified in many other countries around the globe [ , ] . subsequently, the world health organization (who) announced that the virus can cause a respiratory disease with clinical presentation of cough, fever, and lung inflammation. as more countries are experienced dozens of cases or community transmission, who characterized covid- disease as a pandemic. in such unprecedented situation, doctors and health care workers are putting their life at risk to contain the disease. furthermore, to isolate infected people and combatting the outbreak, many hospitals are converted to covid- quarantine ward. moreover, a surge of covid- patients has introduced long queues at hospitals for isolation and treatment [ ] . with such high number of infections, emergency responders have been working non-stop sending patients to the hospital and overcrowded hospitals refused to in more patients. for instance, recently, in italy, medical resources are in short supply, and hospitals have had to give priority to the researchers can access the implementation and programming code in https ://githu b.com/halgu rd /lockd own_covid . people with a significant fever and shortness of breath over others with less severe symptoms [ ] . as the covid- continues to spread, countries around the glob are implementing strict measures intensify the lockdown, from mass quarantine to city shutdown, to slow down the fast transmission of coronavirus [ , ] . during the lockdown, people are only allowed to go out for essential work such as purchasing food or medicine. ceremonies and gatherings of more than two people are not permitted. these strict rules of quarantine only allow few to move around the city including delivery drivers providing vital lifeline. on the other hand, few countries, such as japan, has declared a state of emergency in many cities in an attempt to tackle the spread of the virus. although covid- started as a health crisis, it possibly acts as a gravest threat to the world economy since global financial crisis [ ] . covid- epidemic affects all sectors of the economy from manufacturing and supply chains to universities. it is also affect businesses and daily lives especially in countries where the covid- has hit the hardest. the shortage of supply chain has knock-on effects on economic sector and the demand side (such as trade and tourism). this makes a supply constraint of the producer and causing a restraint in consumer's demand, this may lead to demand shock due to psychological contagion. to prevent such widespread fallout, central banks and policymakers have been rolling out emergency measures to reassure businesses and stabilize financial markets to support economy in the phase of covid- . currently, most countries are in the same boat with leading responsibility of group twenty and international organizations [ ] . to meet the responsibility, many companies and academic institutions around the world made efforts to produce covid- vaccine. however, health experts state that it may take time to produce an effective vaccine. as an effective vaccine for covid- is not probably to be in market until the begin of next year, management of lockdown is an imperative need. thus, public health officials combat the virus by manual tracking of recent contacts history of positive covid- cases. this manual contacttracing is very useful at the early spreading stage of the virus. however, when the number of confirmed cases was increased tremendously in some countries, manual contacttracing of each individual is labor-intensive and requires huge resources [ ] . for example, an outbreak of the covid- at a funeral ceremony in an avenue in erbil, kurdistan region left regional government with hundred of potential contacts. this situation or many other scenarios of massive number of cases burden the government on trying to manual tracking all contacts [ ] . it is risky that health authorities cannot easily trace recent covid- carrier cases, so that its probability of occurrence and its impact can hardly be measured. technology can potentially be useful for digital contacttracing of positive coronavirus cases [ ] . smartphone can use wireless technology data to track people when they near each other. in particular, when someone is confirmed with positive covid- , the status of the smartphone will be updated and, then, the app will notify all phones in the vicinity. for example, if someone tests positive of covid- and stood near a person in the mall earlier that week. the covid- carrier would not be able to memorize the person's name for manual contact-tracing. in this scenario, the smartphone contact-tracing app is very promising to notify that person [ ] . this automated virus tracking approach could really transform the ability of policymakers and health authorities to contain and control the epidemic. in this situation, a dashboard is required to assist governments and health authorities to predict when lockdown and selfquarantine will end. this study first reviews the current solutions to combat covid- . then, we developed a smartphone-based approach to automatically and widely trace the contacts for confirmed covid- cases. particularly, contact-tracing approach creates a list of individuals in the vicinity and notifying contacts or officials of confirmed covid- cases. this approach is not only providing awareness to individuals they are in the proximity to the infected area, but also tracks the incidental contacts that the covid- carrier might not recall. thereafter, we developed a dashboard to provide a plan for policymakers' officials on how lockdown/mass quarantine can be safely lifted, and hence tackling the economic crisis. applying mass quarantine to people who might be exposed to contiguous covid- in specific areas without any plan and information of infected people in those areas will lead to economic collapse. for example, if there are limited confirmed covid- cases in some areas, restrictions on mass gatherings should be eased and consequently relaxing social distancing among people to allow them for necessary shopping and using transportation. from a technical standpoint, we summarize the most important contributions of this paper as follows: . we build a tracking model based on positional information of registered users to conduct contact-tracing of confirmed covid- cases. . we propose a smart lockdown management to predict level of mass quarantine in those areas. . to notify contacts for confirmed cases, we also developed a notification model to cluster lockdown regions. the rest of this paper is organized as follows. the section "related work" provides the literature review on recent advances of developed ai systems for covid- detection. this is followed by presenting an overview of the proposed approach and details of the designed algorithm in the section "proposed smartphone-based contact-tracing". the section "experiments and deployment" presents the experiments which are conducted in the paper. finally, the section "conclusion" concludes the paper. countries practice many restrictions to respond the fast transmission of covid- pandemic including quarantining people with toughest level of social restrictions, closing public and private sectors, and early diagnosis of infected people via recent technologies. however, none of these solutions will be considered as permanent cure due to bad effecting of the daily life. apparently, such solutions have dramatic and chronic impact on social and economic dimensions. therefore, there is a need for digital contact-tracing to tackle the afore-said issues. in this section, recent trends on contact-tracing are investigated and compared with the proposed approach. several solutions ranging from company's products to an academic research studies have been proposed in mitigate the negative consequences of covid- . in particular, an application in singapore named smartphone-based contact-tracing is developed, aarogya setu [ ] is also used in india to support the difficulty of covid- situations. furthermore, some solutions are under development in united kingdom in collaboration with giant companies including google and apple [ ] . in [ ] , a new system has been implemented using onboard smartphone bluetooth technology to track people who exposed confirmed cases. the system can notify the nearby users in the public area when the infected users are approaching and the area will be quarantined to control the spreading of the virus in the vicinity. however, such study will not provide a comprehensive solution to predict the lockdown area and will not updating the prediction issue, periodically. in another attempt, authors in [ ] proposed a new decentralized approach to track the contact-tracing, which is named caudht (contact tracing application using a distributed hash table) . the approach is trying to preserve the privacy issue of the users (including public health users and infected users), since the system is exchanging data in a blind signature mechanism [ , ] . furthermore, the approach uses the distributed hash table method to messaging between the users. however, if such approach is implemented on the distributed server, it needs huge computation and incurs huge cost. in comparison to their proposed system, the proposed approach is working on temporal tracked information of the registered users, which is not required a large space on the server. furthermore, most of the computations of the system including notifying users can be run on the smartphone users. in [ ] , the authors modeled on how covid- spreads over populations [ ] in countries in terms of the transmission speed and containing its spreading. in the model, r is representing the reproduction number, which is defined the ability of the virus in infecting other people as a chain of contagious infection. infected individuals rapidly infect a group of people over very short period of time, which then yields an outbreak. on the contrary, the infection would be in control if the probability gets closer of one person to infect less than one other person [ ] . this is exactly happening in fig. ; when people (black color) who have come into contact with an infected person (red color), the infection would be spread rapidly. one important aspect is how the number of infected people looks like depends on several factors, such as the number of vulnerable people in the communities, the time takes to recover a person without symptoms, the social contacts and possibility of infecting them with coronavirus. furthermore, another factor will affect fast spreading of coronavirus is the frequency of visiting crowded places such as malls and minimarkets [ ] . thus, policymakers and public health authorities are responsible to manage and plan a convenient way to contain the epidemic. moreover, countries at the early stage of virus spreading need to control the epidemic by typically isolating and testing suspected cases tracing their contact and quarantine those people in case they are infected. testing and contact-tracing at wide scale, the better the chance of containment. in the case of covid- , research studies have been conducted for containment or controlling the fast spreading, and hence helping policymakers and societies in ending this epidemic [ ] . in [ ] , the authors have investigated the importance of confirmed covid- case isolation that could play a key role in controlling the disease. they have utilized a mathematical model to measure the effectiveness of this strategy in controlling the transmission speed of covid- . to achieve this goal, a stochastic transmission model is developed to overcome the fast person-to-person transmission of covid- . according to their research study, controlling virus transmission is within weeks or by a threshold of accumulative cases. however, controlling the spread of the virus using this mathematical approach is highly correlated to other factors like pathogen and the reaction of people. one key role to track infected people and predict ending lockdown is contact-tracing. when a patient is diagnosed with infectious disease like covid- , contact-tracing is an important step to slowing down the transmission [ ] . this technique seeks to identify people who have had close contact with infected individuals and who, therefore, may be infect themselves. this targeted strategy reduces the need for stay at home periods. however, manual contact-tracing is subject to a person's ability to recall everyone they have come in contact over a week period. in [ ] , the authors exploited the cellphone's bluetooth to constantly advertise the presence of people. these anonymous advertisements, named chirps in bluetooth, are not containing positional or personally identifiable information. every phone stores all the chirps that it has sent and overheard from nearby phones. their system uses these lists to enable contact-tracing for people diagnosed with covid- . this system not only traces infected individuals, but also estimates distance between individuals and amount of time which they spent in close proximity to each other. when a person is diagnosed with covid- , doctors would coordinate with the patient to upload all the chirps sent out by their phone to the public database. meanwhile, people who have not been diagnosed can their phones do a daily scan of public database, to see if their phones have overheard any of the chirps used by people later diagnosed by covid- . this indicates that they were in close prolonged contact with that anonymous individual. figure shows the procedure of exchanging anonymous id among users for contact-tracing. as stated in the aforementioned section, manual contacttracing is labor-intensive task. in this section, we detail out each part of the proposed smartphone-based digital contacttracing, as shown in fig. . the main idea of the proposed framework in fig. to enable digital contact-tracing to end lockdown and the same time preventing the virus from spread-ing. the best thing to do seems to be let people go out for their business, but any body tests positive of covid- , we would be able, through proposed framework, to trace fig. a framework of contact-tracing using smartphone-based approach everybody in contact with the confirmed case and managing the lockdown and mass quarantine. this will confirm preventing the spread of the virus to the rest of the people. the first step of the proposed contact-tracing model is registration of users. there is no doubt registration and coverage of high percentage of population is very significant for effective pandemic control. users provide information such as name, phone number, post code, status of the covid- disease (positive, negative, or recovered). effectiveness of the application and digital contact-tracing depends on two factors speed and coverage. for the proposed framework, we utilize global navigation satellite system (gnss) receiver for outdoor environment, whereas bluetooth low energy is used in indoors. in our proposed model, bluetooth technology does not need make a connection setup between the users, while the system only requires the discovery process to retrieve the mac address of the nearby users and then performs the process of matching the infected users with their mac addresses. speed depends on how to reduce the time required for contact-tracing from few days to hours or minutes. the more people register in the system, the better performance of the system in terms of both speed and coverage of contact-tracing. in the second step, global positioning system (gps) receiver is used by the proposed model to track either individuals or a group of people visiting to a common place. the gps service class updates user coordinates to the database in every few seconds. once a registered user reports gets infected with covid- , his test result would be send to the public database in central computer server. other registered users will regularly check those central server provider for possible positive covid- cases they were in contact in the past weeks. server is responsible to compare the infected id with its list of stored ids. a push notification will be send, by the server, to those who were in contact with a person tests positive. it is important to note that the information would be revealed to the central server is an id of the phone. in another scenario, tracking users' position information could be periodically stored on the server for the purpose of exchanging notifications. furthermore, this means that only the infected users' information would need to be stored on the server. certainly, the records of infected area should be updated periodically. therefore, the system does not need huge computations on the server because of the issue of tracking infected users would be run on the smartphone. the only function that should be run of the server is the lockdown area prediction function. fire-based cloud messaging is used to send push notification to multiple devices even the apps are paused or running in the background. many apps send push notification, which indicate an alert to the users. this is happen when a person is approaching someone who is infected with covid- or nearby a lockdown area. to protect the privacy of those who have the coronavirus, we only include an alerting message into the push notification. this certainly would be very useful for entire population to make informed decision about not getting close to covid- area. however, this notification would help the public health professionals rather than replace it. the proposal is also including a lockdown prediction model. the model is working based on the collect geographic in-formation and crowding level of the registered users in the system. there are many algorithms to perform the cluster-ing on collected data including k-means clustering, mean-shift clustering, density-based spatial clustering of applications with noise (dbscan), expectation-maximization (em) clustering using gaussian mixture models (gmm), and agglomerative hierarchical clustering. however, the k-means clustering is the fast method among the other algorithm to find and allocate points with respect to the discovered clusters or group of points [ ] . for the reason of time and space complexity, in this study, the k-means clustering algorithm is used and implemented to prediction process. in this study, k-means as an unsupervised machine learning algorithm is used to cluster the users' positions information and predict that the area should be locked down or not based on the same empirical thresholds. this section presents the details of how the proposed approach will be implemented. the proposal includes two main parts. first, deploying an application on android-based smartphone which will be used by the users and track/send mobility information of the users to the system. while the second side is a web portal (including a comprehensive dashboard) to monitor and predict the visited area that should be locked down or not. a. smartphone application . an android application is implemented on the smartphone. the application lets the users to register their information into the proposed system including name, postcode or zip code, phone number, age, bluetooth mac address, gender, and covid- status. the bluetooth mac address is automatically captured through the application without user interaction. the covid- status includes three options which might be covid- , none covid- , and recovered. figure a shows a snapshot of the application form for the registration process. . once the users have completed the registration process, they can enter into the position tracking model. the tracking model is to send user's position information into the database of the system as well as shows the google map regarding to their positions, as shown in fig. b . . beside this, the users are also can receive the notification or alert about the areas which have been visited by infected users. the notification is working in the background, i.e., the user may be paused the application and uses other application on the smartphone. however, when the user opens the application and enters the infected area will receive the alert dialog. figure c and fig. d show an example of the notification and alert dialog. the notification and dialog alert models are also configure both outdoors and indoors. for example, for outdoors, the gnss position information of the users is used to measure the distance between any two users' positions, and then, if the distance is less than m, then the notification or the alert dialog would be raised. however, for indoors, the application scans for bluetooth devices in the vicinity, and then, the result of the scan is matching with pre-registered mac addressed in the system. if the matched mac addresses have covid- or recovered cases, then the notification model and the alert dialog will notify the users about having covid- or recovered users in the scan area. a web portal for the system's administrators is designed and implemented using html , php, javascript, and google map api. this part of the system is to monitoring and tracing the registered users only in terms of how the areas (which have been visited by users) should be lockdown or not? to this end, an unsupervised machine learning (uml) algorithm has been implemented in the system. there are several uml algorithms including neural networks, anomaly detection, clustering, etc. however, for this system, k-means clustering algorithm is used to predict the lockdown approach for the visited area. the k-means algorithm, first, reads the tracked users' position information and their status covid- . then, in the next step, it will calculate the centroid position of the areas based on the dasv seeding method. the dasv method is an efficient algorithm to select the best centroid position among a set of nearest positions in the vicinity. in this study, two different spaces have been selected via dasv method, since only two crowded area are tested. then, the centroid positions will be updated based on how the positions are nearest to each them. the pseudo code of the k-means clustering algorithm is shown in algorithm . once, the process of the clustering of the tracked users' positions information has completed, a set of clusters will be produced. then, for each cluster, the distances between the positions of the different users are calculated. this is to calculate how many times the users, in the vicinity, are approaching to each other (from now called aeo). for this study, five users (user a with marker-yellow color, user b with marker-orange color, user c with marker-pink color, user d with marker-green color, and user e with markerblue color) are participated into the system in two different areas in usa. therefore, two different scenarios via the five users are conducted for the k-means algorithm, as shown in fig. . in the first scenario, the users are walking and they are located in denver area in colorado, usa, while in the second scenario, they are located in aspen area in colorado, usa. each user, at every s sends their location information (including latitude and longitude), and duration of each scenario is min of walking. therefore, approximately each user sends records of location information to the server. a threshold for the approaching distance has been initialized to m, i.e., if user a has been approached around m to user b, or c, or d, or e, it means that the users are too near to other users. for the two scenarios, if aeo is greater than , the system assumes that this area is too crowed and the system will predict that the area should be locked down. however, if the value of aeo is less than ten times, it means that the area should not be locked down. for ten trial experiments, the model predicts that the denver area in the first scenario should be locked down, since the five users during the walking in the area are approaching to each other for times and they passed the threshold (i.e., m). however, in the second scenario, the same trials have been tested parallel with the second scenario, and the model predicted that the aspen area does not need to be locked down, since the users are walked far to each other. both scenarios results are shown in fig. . as an initial study, only two different scenarios in two different areas are analyzed. however, more complex scenarios and hypothesis in the future could be conducted. lockdown area prediction using recent technologies, especially via onboard smartphones technologies, is the necessity for most of the countries. such management is very important for the purpose of economic sector and the demand side including trade and tourism. this practical research has shown that the lockdown issue for an intended area could be predicted using machine learning algorithms such as k-means clustering algorithm. the algorithm is implemented on a server as well as the server receives the tracked location information of the smartphone users. this fig. the results of the prediction model for both scenarios is followed by send back notifications from the server to the users to notify them for the crowded area and controlling the spreading the coronavirus covid- . furthermore, this management is also giving a feedback for the policymakers about of locked down area or not. the time and space complexity of the implemented algorithm on the server is depending on the size of the number of participant users. to this end, proposed approach, temporary, uses the tracked location information only for purpose of lockdown prediction issue. therefore, the approach keeps the privacy of the participant smartphone users. a set of experiments and trials have been conducted to prove the validity of the proposed approach. however, further study for setting up server, managing the implemented algorithm, and providing robust security/privacy issues is needed. at the emergence of covid- , many countries worldwide are commonly practiced social distancing, mass quarantine, and even strict lockdown measures. smart lockdown management is a pressing need to ease lockdown measures in places where people are practicing social distance. in this paper, we developed a smartphone-based approach to inform people when they are in proximity to an infected area with covid- . we also developed a dashboard to advise health authorities on how specific area safely get people back to their normal life. the proposed prediction model is used positional information and distance measurements of the registered users in the proximity. the policymakers and public health authorities would be able to take benefit from the proposed dashboard to get latest statistics on covid- cases and lockdown recommendation in different areas. the weak point of this study is the privacy issue of tracking position information of the users. this issue would be solved by applying encryption algorithms, in near future. however, the weak point of this proposal is that: in the further study, more experimental and complex scenarios are also needed to verify the validity of the proposed system. for example, if the number of recorded of tracked users is wider and if the prediction model is intended to use for a bigger city includes new work city in united state or london in united kingdom; also, using deep learning algorithms rather than of using only k-means algorithm. hopefully, in near future, these requirements would be considered for the public health care system. deep learning-based model for detecting novel coronavirus pneumonia on high-resolution computed tomography: a prospective study novel coronavirus in the united states covid- pneumonia level detection using deep learning algorithm lockdowns can't end until covid- vaccine found, study says diagnosing covid- pneumonia from x-ray and ct images using deep learning and transfer learning algorithms can we compare the covid- and crises? what is contact tracing? number of covid- cases reaches in kurdistan region; iraq's total now skin melanoma assessment using kapur's entropy and level set-a study with bat algorithm apple and google partner on covid- contact tracing technology role of telecom network to manage covid- in india: aarogya setu covid- contact tracing and data protection can go together decentralized contact tracing using a dht and blind signatures a trustworthy system with mobile services facilitating the everyday life of a museum an efficient blockchain-based approach for cooperative decision making in swarm robotics what we scientists have discovered about how each age group spreads covid- covid- optimizer algorithm, modeling and controlling of coronavirus distribution process artificial intelligence for coronavirus outbreak a novel ai-enabled framework to diagnose coronavirus covid using smartphone embedded sensors: design study ai-driven tools for coronavirus outbreak: need of active learning and cross-population train/test models on multitudinal/multi-modal data feasibility of controlling covid- outbreaks by isolation of cases and contacts safe paths: a privacy-first approach to contact tracing a k-mean-directions algorithm for fast clus-tering of data on the sphere conflict of interest the authors declare that they have no conflict of interest. moreover, this research was not funded by any funding agency. key: cord- -vgs w b authors: ma, rongyang; deng, zhaohua; wu, manli title: effects of health information dissemination on user follows and likes during covid- outbreak in china: data and content analysis date: - - journal: int j environ res public health doi: . /ijerph sha: doc_id: cord_uid: vgs w b background: covid- has greatly attacked china, spreading in the whole world. articles were posted on many official wechat accounts to transmit health information about this pandemic. the public also sought related information via social media more frequently. however, little is known about what kinds of information satisfy them better. this study aimed to explore the characteristics of health information dissemination that affected users’ information behavior on wechat. methods: two-wave data were collected from the top wechat official accounts on the xigua website. the data included the change in the number of followers and the total number of likes on each account in a -day period, as well as the number of each type of article and headlines about coronavirus. it was used to developed regression models and conduct content analysis to figure out information characteristics in quantity and content. results: for nonmedical institution accounts in the model, report and story types of articles had positive effects on users’ following behaviors. the number of headlines on coronavirus positively impacts liking behaviors. for medical institution accounts, report and science types had a positive effect, too. in the content analysis, several common characteristics were identified. conclusions: characteristics in terms of the quantity and content in health information dissemination contribute to users’ information behavior. in terms of the content in the headlines, via coding and word frequency analysis, organizational structure, multimedia applications, and instructions—the common dimension in different articles—composed the common features in information that impacted users’ liking behaviors. since the outbreak of the novel coronavirus (covid- )-infected pneumonia (ncp) in december , it has quickly spread across the world. the world health organization declared the outbreak of covid- as a global public health emergency. more than million cases have been confirmed as of june [ ]. covid- has attracted attention worldwide, and information and discussions about it have been spreading on the internet, especially on social media. the ubiquity and ease of access make social media a powerful complement to traditional methods in information dissemination [ ] . according to a report released by csm media research, . % of chinese people purposefully seek pandemic information online, and approximately . % use wechat more frequently during the outbreak than before it occurred [ ] . social media is widely used for disseminating health information [ ] . as one of the most popular social platforms in china, wechat has more than . billion monthly active users [ ] and has become a frequently used information dissemination platform [ ] . wechat contains a specific module called wechat official account [ ] , which is a platform operated by institutions, communities, or individuals. wechat official accounts are widely used to share stories, report news, or disseminate various types of information. information on these accounts can be posted by anyone, including experts, novices, and even saboteurs [ ] . wechat has changed channels of health information dissemination and manners of obtaining feedback [ ] . for instance, evidence-based clinical practice guidelines in medical fields are traditionally spread by publishing in peer-reviewed journals, sending emails or paper notices to physicians, and advertising through news media outlets [ ] . however, wechat official accounts now enable guidelines for covid- to be shared on this platform when they are simultaneously published by health authorities. in response to the pandemic, people prefer to receive real-time news and instructions on personal protection [ ] . as such, many wechat official accounts have posted articles about ncp. however, different account operators tend to post various types of articles in different numbers. for example, some accounts have reported the number of infected cases every day to keep people informed about the pandemic state. some accounts have instructed the public to protect themselves. some accounts have refuted fake news to avoid confusion and inappropriate interventions. health information posted on these accounts can have a great impact on receivers' behavior because of its real-time nature and various forms [ ] . they can express their appreciation and interest by liking an article or following an account [ ] . we found that the number of followers on many official accounts changed dramatically within a week. meanwhile, the number of likes differs greatly among articles. in this work, we aimed to determine whether and how health information dissemination affected users' information behavior in terms of following an account and liking a post. researchers studied the influence of health information on information behaviors on different social media platforms, such as facebook and microblogging sites. the findings are shown in table . zika, mers, and chikungunya messages motivate the public to search for related information frequently and to post actively. bragazzi et al. [ ] mahroum et al. [ ] jang et al. [ ] mommy blog users with a personal connection to the health issue tend to post articles about it. burke-garcia et al. [ ] adoption and sharing behavior pregnancy-related information influences expectant mothers to adopt and share from the perspective of perceived influence and prenatal attachment. zhu et al. [ ] harpel. [ ] lupton. [ ] commenting behavior microblog information correlated with the vaccine event or environmental health in china can significantly influence users' comments. an et al. [ ] wang et al. [ ] prevention behavior instagram facebook intervention messages on breast cancer can effectively affect prevention behavior and lead to high exposure scores in consideration of the influence of leaders' opinion. wright et al. [ ] however, we found that few researchers concentrated on wechat in china. the above did not study detailed characteristics in the information. these studies mainly focused on the effect of health information on information behavior. however, during the pandemic, users may concentrate on different types of information, and their reaction to a given information may vary. for example, more people have followed wechat official accounts to give continuous attention to the pandemic [ ] , users' behavior on social networks is the inclination to use social network services and various activities [ ] . researchers studied users' information behavior on some popular social media platforms. for example, bian et al. [ ] found that the tendency to discuss promotional lynch syndrome-related health information on twitter shows users' prevention awareness. iftikhar et al. [ ] clarified that health information on whatsapp, facebook, and twitter can urge users to verify it on google. meanwhile, gan [ ] summarized three factors, namely, hedonic, social, and utilitarian gratifications, which affect the tendency of wechat users to like a post. users' information behavior is manifested everywhere on the internet [ ] . their behavior on wechat official accounts includes acquiring information, liking a post [ ] , and following an account. different behaviors may reflect different inclinations. for example, reading an article shows users' interest in a certain health theme [ ] . liking a post reflects their preference and appreciation [ , ] . after reading an article, users can like it to show their appreciation for the important message [ ] , and following accounts may indicate that users want to know what is being posted and their willingness to pay continuous attention [ ] . however, to the best of our knowledge, few studies have focused on analyzing the influence of information on users' information behavior to explore specific characteristics that satisfy wechat users. thus, this study aimed to address this issue. for this purpose, we developed multiple and simple linear regression models. we chose the number of different types of articles and the aggregated number of headlines on ncp posted on the selected accounts in a -day period as independent variables (a total of seven) to denote the health information source and reflect the dissemination state. we also chose the number of new followers and likes in this period as dependent variables to represent users' information behavior. then, we analyzed the relationship between information and behavior in quantity. we selected the number of related articles because it is a critical indicator in evaluating information dissemination [ ] . besides, for the impact of content on liking behaviors, we chose all of the headlines on ncp which won more than , likes to conduct our content analysis. information can affect users' information behavior on other media [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . we want to explore whether information conveyed in each type of articles posted on wechat can play the role, impacting users' following and liking behavior. thus, on wechat official accounts, we drew the following hypotheses-h to h . these articles will be classified into different types in the later part of this paper. h : the headlines with a great number of likes may possess common characteristics in content that can impact users' liking behavior. we collected data from the xigua website (data.xiguaji.com) in china. it is a big data platform that provides operational data on wechat official accounts. the data include the number of articles posted in the last days, comments, the number of likes, and other information on each account. xigua is an open-access website for researchers, and official accounts on this website can be classified into different fields, such as economy, sports, and health. we focused on health and used data on monthly rankings. we used bazhuayu, a chinese web crawler software for data collection, to collect data within the top accounts as shown in figure . the outbreak of the disease in china occurred on january . since then, information regarding the pandemic has attracted considerable attention. at this time, the online reaction of the public may be greatly intense as they were faced with this severe condition suddenly. in a short period, their behaviors may be easier and more obvious to observe than before. thus, we selected january , and january , as two time nodes to collect and classify account information, including the name, rank, operator, and number of followers. these accounts can be classified into three types based on their operators: nonmedical institution, medical institution, and individual accounts. different types of accounts are operated by different stakeholders. nonmedical institution accounts are operated by companies and governments; medical institution accounts are administrated by hospitals, including maternal and child care service centers; and individual accounts are managed by individuals. table presents the number of accounts. then, we calculated the change in followers in - january . because we intended to study information on ncp, we filtered several accounts to identify the influence of information on the pandemic and deleted those who did not post any article related to ncp. for the remaining accounts, . % ( / ) were nonmedical institution accounts, . % ( / ) were medical institution accounts, and . % ( / ) were individual accounts. figures and show the screenshots of the rank list and data collection page, respectively. following an account and liking a post can represent users' activity. the change in the number of followers in the -day period and the aggregated number of likes in the headlines that are correlated with ncp can reflect users' information behavior. thus, we used them as two dependent variables. we recorded the state of articles on every account and counted the number of posts on ncp. we classified them into six types; namely, counter-rumor, report, science, story, instruction, and others. we classified articles that struck sensationalism or misinformation and clarified a fact as a counterrumor article. we classified articles on news about the state of the pandemic, several facts, or a press conference conducted by the national health commission or other governmental institutions as a report. we also grouped an interview with professionals as a report. we categorized posts on scientific outcomes about ncp, explanations of this new virus, or information about psychology under science. we identified shared self-or public-description articles about how physicians resisted the pandemic in hospitals as a story. we identified posts that instructed the public to protect themselves or published a diagnosis and treatment guideline as instruction. other posts, such as commentary, appealing for aid, advocating, and encouraging articles, were grouped under others. we classified posts that integrated more than one type of topic based on titles and main contents. this classification standard was approved by all the authors. we defined six independent variables for the six article types. following an account and liking a post can represent users' activity. the change in the number of followers in the -day period and the aggregated number of likes in the headlines that are correlated with ncp can reflect users' information behavior. thus, we used them as two dependent variables. we recorded the state of articles on every account and counted the number of posts on ncp. we classified them into six types; namely, counter-rumor, report, science, story, instruction, and others. we classified articles that struck sensationalism or misinformation and clarified a fact as a counter-rumor article. we classified articles on news about the state of the pandemic, several facts, or a press conference conducted by the national health commission or other governmental institutions as a report. we also grouped an interview with professionals as a report. we categorized posts on scientific outcomes about ncp, explanations of this new virus, or information about psychology under science. we identified shared self-or public-description articles about how physicians resisted the pandemic in hospitals as a story. we identified posts that instructed the public to protect themselves or published a diagnosis and treatment guideline as instruction. other posts, such as commentary, appealing for aid, advocating, and encouraging articles, were grouped under others. we classified posts that integrated more than one type of topic based on titles and main contents. this classification standard was approved by all the authors. we defined six independent variables for the six article types. moreover, we counted the number of headlines on ncp to explore its correlation with likes. each account will post many articles on ncp every day, and headline is the first one with a conspicuous title and illustration. we recorded and counted the total number of likes in each article in this period. we defined headlines as another independent variable. table presents the collected and processed sample data. the scale of change in the number of followers was , ; for the number of likes, the scale was . before estimating the models, we tested whether the variables were normal. we used a one-sample kolmogorov-smirnov test to examine the normality of variables. table shows the results. the sample size was (n = ). all p values were below . . therefore, all variables were normal and could be estimated in the linear regression models. we developed a multiple linear regression model to explore the relationship between the change in the number of followers and the six types of articles. meanwhile, we developed a simple linear regression model for the aggregated number of likes. we proved the normality of variables. models are shown in the following two equations. y i represents the change in the number of followers in the -day period. counter-rumor i , report i , science i , story i , instruction i , and others i denote the number of counter-rumor, report, science, story, instruction, and other types of articles, respectively. y i ' represents the aggregated number of likes in headlines in this period. headlines i indicates the total number of headlines related to the pandemic. where i = , , . . . , n index all accounts; α to α are the parameters to be estimated; ε i is the corresponding residue. where i = , , . . . , n index all accounts; β and β are the parameters to be estimated; ε i is the corresponding residue. we designed our research to figure out the effect of information quantity on users' information behaviors. we were also interested in the effect of content. we found an interesting phenomenon that among the accounts whose articles were usually unpopular, one article received a large number of likes. it did not correspond to the popularity of the account. for example, west china hospital lagged in the rank list, and most of its posted headlines were plain. nevertheless, on january , it posted a headline that received , likes, which might be a crucial factor that can affect our regression result. we can hardly find an article that received such an unexpected number of likes. we determined the reason why these articles were exceptional in terms of users' liking behavior. to further conduct our study, we browsed the selected headlines that received more than , likes in this period and explored their characteristics in terms of content that could affect the liking behavior. we examined a total of headlines. we code them from perspectives: the account group that an article comes from, original/non-original articles, the article type, and the length of articles. meanwhile, we recorded the form of multimedia applied in each article to show information (including the number of videos, pictures and graphics). the codebook is presented in tables a -a in appendix a. the intercoder reliability was tested to be ideal. the coding results and some statistics are shown in table . we used spss . to analyze the data. table shows the estimation results based on the least squares method and stepwise regression. we developed model - to represent nonmedical institutions, medical institutions, and individual accounts, respectively. however, not all models could fit well. for nonmedical institution accounts in model , the variables of report and story types had a significant effect (b = . , p = . ; b = . , p = . ) and played a positive role. the remaining variables were insignificant. for the medical institution accounts in model , the variables of report and science types were significant (b = . , p = . ; b = . , p < . ) and positive. however, for individual accounts in model , we did not obtain any result. model and had adjusted r of . and . , respectively, denoting an acceptable fit. we were unable to obtain a satisfactory result for model . thus, we partially confirmed h . this section explored h . table shows the simple linear regression result based on the least squares method and stepwise regression. among the three groups, only nonmedical institution accounts in model showed significance. the variable of headlines played a positive role (b = . , p< . ). the adjusted r was . , denoting an acceptable fit. we did not discover significance for medical institution and individual accounts. thus, h was partially confirmed. we found some impact factors of information dissemination on behavior, but we did not obtain a significant result in model and when we analyzed headlines and likes. it may because of some exceptional articles with a large number of likes recorded in table that led to insignificance when we analyzed model . when we discarded this datum, the analysis result was significant. thus, this factor could remarkably affect our results. some findings in content analysis are as follows. of the articles, ( %) were posted by nonmedical institution accounts. of these articles, were posted by dingxiang doctor, which was the most active account. dingxiang doctor was also the second-most popular in the rank list. besides, one account named dingxiang yuan posted article. these two accounts are affiliated with the same company called hangzhou lianke meixun biomedical technology corporation. in addition to these articles, ( %) was posted by medical institution accounts, and ( %) were posted by personal accounts. however, the only article posted by west china hospital received the most number of likes and reached , . the reason why articles from the medical institution group accounted for the least proportion may be that these accounts usually post affairs about their affiliated hospitals, which may be less interesting in the public opinion. compared with them, the public tend to prefer articles from nonmedical institution accounts. these accounts usually post various types of articles about common sense or short stories, which are easy for the public to understand and receive. this may be the reason why the public pay more attention to them and their articles. among the articles, ( %) were original, and ( %) were not. we did not identify an evident preference for originality. in these headlines, instruction, story, others, and counter-rumor types accounted for % ( / ), % ( / ), % ( / ), and % ( / ), respectively. report articles had the same proportion, accounting for % ( / ). for the two others, one article presented a timeline since the pandemic broke out, whereas the other article revealed several latent dangers after the city was locked down. story-type articles were confirmed to be positive in regression analysis. an instruction-type article could provide suggestions during a public health emergency. this article type might be the most popular because it met users' demands to seek prevention. wu [ ] believed that perceived usefulness is a precondition of users' overall satisfaction. perceived usefulness can affect users' attitude and determine the continuance of using an information system [ ] . reading behavior can show the perceived usefulness from users [ ] , and liking an article may denote their gratifications [ ] . instruction articles will inspire users' perceived usefulness and promote an account's popularity. we studied the length of each article and the method of transmitting information in table . the length varied among headlines; articles in the ( %) was coded as " ", possessing - characters. many articles limited the content length by using visual aids. for example, all articles applied infographics, including images and graphics. the post "can you go out without a mask? experts recommended the proper wearing of masks." by west china hospital had the most images (up to ). infographics and other visual aids, such as videos, can promote health information communication [ ] . using visuals based on conventional text gains an ideal outcome from the perspective of health information promotion [ ] . infographics and videos can help users visualize information and facilitate the straightforward understanding of information. as a result, the content is concise and clear, and it may help account operators to improve their performance in dissemination, making it easier for the public to receive information. although the types of articles varied, most of them integrated different types. for example, the article "can you go out without a mask? experts recommended the proper wearing of masks." not only provided an instruction but also appended a report on the pandemic. the counter-rumor article "novel coronavirus fears alcohol and high temperature, but vinegar, saline, and smoke are useless: rumors you need to know." also taught several prevention methods. none of the articles had only one type of content. our coders classified the articles on the titles and main content. diversity in types could simultaneously enhance the practicability of the content and meet users' different demands. however, the main part of the article should be specific to prevent it from being misinterpreted. we counted and recorded high-frequency words in these articles. they are shown in table . words occurring more than five times were listed in it. all the articles introduced general features about covid- , mentioning some same words, such as pandemic, doctor, infection, and so on. along with it, we found that different types of these articles all referred to one common dimension. that is the instruction. for example, words including mask, wash hands, and isolation indicated instructions on how to protect the public. they existed in most of these articles. besides, arbidol hydrochloride capsules is a drug used to relieve the state of infected cases. this noun also showed up in different articles, introducing an instruction on selecting drugs. most of them introduced instructions on prevention. it may because of the usefulness perceived by readers that helped these articles win a large number of likes. table . high-frequency words in each article. a wuhan doctor was suspected of being infected. he recovered after days' isolation at home! please spread his life-saving strategy to everyone! the usefulness of the article. the comments suggested that the article could facilitate the timely acquisition of knowledge about prevention during a health crisis. furthermore, a story about the heroic contributions of doctors and other people may inspire readers. a counter-rumor type of article may help users identify inaccurate information and prevent them from adopting inappropriate prevention methods. however, popularity is accompanied with limitations, and this issue should be considered. given the severity reported in these articles, information may lead to unnecessary public panic. some people even protect themselves excessively, as the disease is devastating if uncontrolled. however, with the contributions of physicians, we must be hopeful for the future situation. therefore, the aspect of reducing the negative effects of articles on readers should be considered by account operators. a prevention guideline against the new pneumonia. scientific prevention, we should not believe and transmit rumors. coronavirus, pneumonia, protection, prevention, infection, transmission . . . effects on readers are positive frequently but should consider the limitation these articles usually have positive effects on readers. for example, figure shows the screenshot of reviews from readers of an article with the most likes. most of the readers admired and appreciated the usefulness of the article. the comments suggested that the article could facilitate the timely acquisition of knowledge about prevention during a health crisis. furthermore, a story about the heroic contributions of doctors and other people may inspire readers. a counter-rumor type of article may help users identify inaccurate information and prevent them from adopting inappropriate prevention methods. however, popularity is accompanied with limitations, and this issue should be considered. given the severity reported in these articles, information may lead to unnecessary public panic. some people even protect themselves excessively, as the disease is devastating if uncontrolled. however, with the contributions of physicians, we must be hopeful for the future situation. therefore, the aspect of reducing the negative effects of articles on readers should be considered by account operators. the organizational structure, location, and description of information on social media affect public attention [ ] . multimedia applications, such as infographics and videos, make the structure clear and concise. certain types of articles can satisfy the public's demand for information and improve the popularity of articles. diversity in types can promote users' liking behavior and health the organizational structure, location, and description of information on social media affect public attention [ ] . multimedia applications, such as infographics and videos, make the structure clear and concise. certain types of articles can satisfy the public's demand for information and improve the popularity of articles. diversity in types can promote users' liking behavior and health information dissemination because of the superior content of articles. besides, these articles always referred to a common dimension which introduced some instructions. h was proven to be true. this study aimed to explore the effects of health information dissemination on users' information behavior on wechat official accounts. our hypotheses were tested using two-wave data collected from the xigua website over a period of days. meanwhile, we further explored the content characteristics based on the headlines with an unexpected number of likes to answer our research question. first, our results suggested that not all types of articles significantly affected the users' tendency to follow an account. for nonmedical institution accounts, report and story types positively influenced the change in the number of followers. for medical institution accounts, report and science types exerted positive effects. however, we did not find significant relationships for individual accounts. second, the number of headlines on the pandemic contributed to likes for nonmedical institutions. however, we did not obtain the same result in medical institution and individual accounts. for medical institution accounts, we found an article with an unexpected number of likes of up to , . when we rejected this information, the analysis result was significant. thus, several articles with an unexpected number of likes could determine the regression result to a great extent. third, we reviewed headlines to further explore the characteristics of information from articles that could affect users' liking behavior. in the headlines, organizational structure, the manner of description, and the application of multimedia contributed to unexpected likes. users showed an inclination to instruction and story types, especially those from nonmedical institution accounts. health authorities should take advantage of these accounts to enhance health information dissemination and reduce public panic. meanwhile, paying attention to methods of delivering messages was crucial, for the application of multimedia such as graphics, videos or pictures may make it easier for the public to understand and receive information. besides, diversity in types was also crucial in encouraging likes, and the role of instructions should not be left out. these dimensions composed the common features of information that can impact users' likes. this study has theoretical implications. on facebook, twitter, and other social media, information dissemination can affect users' information behavior [ ] . in the present study, we expanded the research scope to wechat in china, especially in the health field. we identified several factors that affected users' information behavior on this platform. for nonmedical institution accounts, report and story types of information should be emphasized. likewise, report and science types should be promoted for medical institution accounts. particular account groups, multiform transmission, and diversity in types, including instruction and story, are essential for promoting popularity. this study also has practical implications. first, on social media, account operators can promote information dissemination. analyzing users' information behavior may allow them to determine the kind of information that satisfies the public. they should fully utilize the superiority of headlines to enhance diffusion. second, the above conclusions could be explored further and in depth by analyzing twitter, facebook, or youtube trends in other countries, contributing to the worldwide campaign in the medical informatics and health dissemination domains. this strategy might help authorities determine what kind of information the public needs. if dissemination is efficient, the public will receive accurate information and useful prevention suggestions in a timely manner. this method can help health authorities successfully manage this public health emergency [ ] . this research has several limitations. during the data analysis, we did not identify the significance of some variables because the sample size was small. for example, we only assessed individual accounts among accounts. for this reason, we could not ensure that such insignificant factors would not contribute to users' information behavior. in our future work, we will involve more comments from users on different groups of accounts and expand the sample size to conduct further analyses. we considered that likes mainly denote users' appreciation. however, some readers may like an article without a valid reason or for conformity [ ] . other users can express their positive feelings by commenting. thus, in further studies, we will assess the implications of users' behavior, such as liking, following, commenting, and even sharing, based on quantitative methods. interviews that reflect users' actual feelings are also essential. we will also explore the effect of content on readers by analyzing their comments. the effects of information dissemination on users' information behavior during the covid- pandemic were examined. two models were developed to test our hypotheses that were partially confirmed. in content analysis, some common characteristics that contributed to users' tendency to like a post were identified. however, this study has some limitations. in our future work, we will include more accounts and adopt measures such as developing a synthetic model and quantitatively assessing these behaviors to solve the problems. who. novel coronavirus ( -ncov) situation report- importance of social media alongside traditional medical publications expectation survey report on users' media consumption and use during the epidemic csm media research: hongkong, china, . available online disseminating research findings preparing for generation y wechat annual data a study on influential factors of wechat public accounts information transmission hotness understanding the function constitution and influence factors on communication for the wechat official account of top tertiary hospitals in china: cross-sectional study exploring knowledge filtering processes in electronic networks of practice social media-promoted weight loss among an occupational population: cohort study using a wechat mobile phone app-based campaign twelve years of clinical practice guideline development, dissemination and evaluation in canada how social media exposure to health information influences chinese people's health protective behaviour during air pollution: a theory of planned behaviour perspective the research on the influencing factors of users' liking behavior in wechat global reaction to the recent outbreaks of zika virus: insights from a big data analysis public reaction to chikungunya outbreaks in italy-insights from an extensive novel data streams-based structural equation modeling analysis when information from public health officials is untrustworthy: the use of online news, interpersonal networks, and social media during the mers outbreak in south korea perceptions about disseminating health information among mommy bloggers: quantitative study pregnancy-related information seeking and sharing in the social media era among expectant mothers in china: qualitative study pregnant women sharing pregnancy-related information on facebook: web-based survey study the use and value of digital media for information about pregnancy and early motherhood: a focus group study selection of users behaviors towards different topics of microblog on public health emergencies empirical study on recogniti on and influence of opinion leaders in emergency partnering with mommy bloggers to disseminate breast cancer risk information: social media intervention research on influential factors of thumbs-up of interior advertorial of wechat official accounts an approach to the study of communicative acts uncertainty in times of medical emergency: knowledge gaps and structural ignorance during the brazilian zika crisis from concerned citizens to activists: a case study of south korean mers outbreak and the role of dialogic government communication and citizens' emotions on public activism when ignorance is bliss the role of motivation to reduce uncertainty in uncertainty reduction theory medicine: before covid- , and after a new dimension of health care: systematic review of the uses, benefits, and limitations of social media for health communication mothers' perceptions of the internet and social media as sources of parenting and health information: qualitative study health care professionals' social media behaviour and the underlying factors of social media adoption and use: quantitative study professional use of social media by pharmacists: a qualitative study engaging the family medicine community through social media mapping physician twitter networks: describing how they work as a first step in understanding connectivity, information flow, and message diffusion medicine . : social networking, collaboration, participation, apomediation, and openness instrumental utilities and information seeking state of the art in social network user behaviours and its future using social media data to understand the impact of promotional information on laypeople's discussions: a case study of lynch syndrome health-seeking influence reflected by online health-related messages received on social media: cross-sectional survey understanding wechat users' liking behaviour: an empirical study in china what makes us click "like" on facebook? examining psychological, technological, and motivational factors on virtual endorsement traffic in social media i: paths through information networks understanding likes on facebook: an exploratory study an empirical study on influencing factors of continuous attention to wechat public accounts: an information characteristics perspective the research on the factors influencing the dissemination of media official micro-blog in the event of emergency patient continued use of online health care communities: web mining of patient-doctor communication information technology adoption and continuance: a longitudinal study of individuals' behavioural intentions study on multi-channel reading behavior choice in the all-media age sometimes more is more: iterative participatory design of infographics for engagement of community members with varying levels of health literacy impact of game-inspired infographics on user engagement and information processing in an ehealth program creating collective attention in the public domain: human interest narratives and the rescue of floyd collins impact of communication measures implemented during a school tuberculosis outbreak on risk perception among parents and school staff the emotional request and subject alienation of this article is an open access article distributed under the terms and conditions of the creative commons attribution (cc by) license we thanks for the funding support of school of medicine and health management, huazhong university of science and technology. the authors declare no conflict of interest. key: cord- -l seadro authors: heumader, peter; miesenberger, klaus; murillo-morales, tomas title: adaptive user interfaces for people with cognitive disabilities within the easy reading framework date: - - journal: computers helping people with special needs doi: . / - - - - _ sha: doc_id: cord_uid: l seadro adaptive user interfaces are user interfaces that dynamically adapt to the users’ preferences and abilities. these user interfaces have great potential to improve accessibility of user interfaces for people with cognitive disabilities. however automatic changes to user interfaces driven by adaptivity are also in contradiction to accessibility guidelines, as consistence of user interfaces is of utmost importance for people with cognitive disabilities. this paper describes how such user interfaces are implemented within the easy reading framework, a framework to improve the accessibility of web-pages for people with cognitive disabilities. the concept of user interfaces that have the ability to change according to the user's requirements, skills, environment, situation, or other criteria has been around for a long time. in general, these concepts can be categorized in adaptive user interfaces and adaptable user interfaces. • adaptive user interfaces [ ] : these systems change their structure, functionalities, and content for the individual user in real time. this is achieved by monitoring the user status, the system state, and the current situation that the user is facing. by using an adaption strategy (mostly rule based), the user interface is changed at run time. • adaptable user interfaces [ ] : this user interfaces are highly adjustable in terms of presentation of information, display of user interface and its components or user interaction/input concepts. the settings are usually stored in a user profile and the user is able to adjust those settings in advance, usually in a settings dialog. during runtime, in contrary to the adaptive user interfaces, these settings do not change. according to laive [ ] , methods for user interface adaptations can further be assigned to the following categories: • adaptable/manual: the user manages the process and performs all actions • adaptable with system support/user selection: the user dominates the adaptation process and the system supports it • adaptive with user control/user approval: the system dominates the adaptation process under the supervision of the user. the system initiates the action and notifies the user about the alternative that he/she has to choose • adaptive/fully adaptive: the whole process is managed by the system, which decides and implements the action based on the preferential model and the main uses adaptive user interfaces show great potential towards enhancing the usability and accessibility of computer systems. user tracking with state-of-the-art sensors could give estimations about the current user's status, and could trigger adequate system reactions based on that [ , ] . however, the added adaptability for user interfaces to improve accessibility might have some unwanted side effects. for example, increasing the font size to address the vision impairment of a person might result in longer text passages and the need to scroll, which in turn results in increased attention and memory demands for the user. therefore, providing extensive adaptability is a highly complex task, as side effects and conflicts are difficult to locate [ ] . another unwanted site effect of fully adaptive user interfaces is the inconsistency caused by the dynamic changes to the user profile, which is then reflected in the user interface. this is another drawback, as consistency across webpages is very important for people with cognitive disabilities and also addressed in guideline . : predictable of the w c web content accessibility guidelines (wcag . ) [ , ] . this paper describes how user interface adaptations are realized within the easy reading framework. easy reading is a software tool that supports cognitive accessibility of web content and enables people with cognitive disabilities to better read, understand and use web pages. this is achieved through functionalities as: • adjustment of the layout and structure of webpages, • explanation/annotation of web content with symbols, videos or pictures, • automatic/supported modification of web content e.g. by translating it into plain language or easy read. easy reading has been designed as a cloud based solution, allowing people to interact with clients implemented as browser extension or mobile applications. within the framework, user interfaces, user interaction and the provided help are adaptable and, to a certain extent, also adaptive for the individual user. in recent years several research projects have been dealing with the creation of adaptive user interfaces for people with disabilities. among those projects, prominent examples are gpii [ ] or myui [ ] . gpii allows the personalization of user interfaces, by the use of cross-platform user profiles for user interface settings, and rule-based and statistical approaches for matchmaking [ ] . the architecture of the gpii was developed by the cloud all project uses an ontology of user needs and preferences that are directly linked to user settings [ ] . the linking is done with a rule based matchmaker (rbmm) that matches user preferences and needs, with solutions available to the system and settings supported by the solution. the matchmaker results therefore in a fully configured solution on the specific system based on the individual preferences and needs of a user [ ] . myui on the other hand was an eu funded project that enabled the generation of individualized user interfaces that would adapt to the individual users needs in realtime, based on a user profile and the actual device [ , , ] . these approaches all work with a user profile that is usually stored online. once the user logs in, the profile is downloaded and a mechanism uses this profile to create a dynamic configuration of software, assistive technology, user interface or the whole operating system for the individual user. adaptations can only be made on features and software that are currently available on the actual device or software, and therefore the user experience might change on different devices. while this approach is sufficient for most users, it is problematic for people with cognitive disabilities, as consistency of user interfaces is very important for them [ ] . another drawback of this solution is that features must be installed on the device first, before they can be used and adapted, which might be another obstacle for people with cognitive disabilities. the easy reading framework allows users to obtain assistance for difficult to cope with content on any webpage. this is done by cloud based software-clients that inject a dynamically generated user interface directly in the current page. by this users are able to trigger different forms of help provided by the framework. the result of the support is then rendered again directly in the webpageallowing the user to stay at the original content and learning to cope with it in the future. figure shows a screenshot of easy reading on a wikipedia page. the user interface is dynamically injected on the rightthe result of triggering an assistance tool provided by the framework is directly rendered within the web-page. in this case the help was an automatically crated aac -version of the second paragraph accomplished by a text analysis cloud service in combination with an aac library. adaptations within the easy reading framework can be applied to the user interface, the help that is provided, the user interaction (how help is triggered) and finally how the help is rendered and presented within the web-page. similarly to existing approaches, these adaptations are based on a user profile that stores user preferences and abilities. currently the user profile hosts the following support categories for the help provided by the framework: • text support: indicates whether and how the user needs help with text and content in general. • layout support: how the layout of websites should be displayed for the user. • reading support: if and how the user needs support in reading text • symbol support: indicates if and how the user needs support with symbol language in addition, the profile holds categories for triggering and displaying the provided help: • input support: stores the preferred way to triggering help and to select where on the web-page help is needed • output support: specifies the preferred way of rendering the help provided based on these categories, once the user logs in with his or her user profile, a dynamically optimized configuration is created for the individual user (see fig. ). unlike other approaches, this configuration is not created locally, but in the cloud, and it also includes personalized user interfaces, personalized help and a personalized way of displaying the help. in this manner, clients within the easy reading framework do not host the code for any feature provided by the framework, as this is dynamically created for each user. this is a big advantage over other architectures, as no matter from which device the user logs in, the user experience is always the same. another advantage is that no additional software needs to be installed, as every feature is prepared in the cloud and downloaded during user login. finally this enables learning and improving personalization of service provision cross different web pages and over time. a drawback of the solution is however that it only works within a browser, while other solutions like gpii would also work across different applications or even operating systems. here however each application has to be gpii-compliant and must implement its interfaces. expanding the easy reading approach towards this broader application scenarios are considered as future challenges. as user skills, know-how and preferences change over time, the easy reading framework hosts a mechanism that automatically updates the preferences of the user profile based on user tracking and usage statistics (fig. ) . while the user is surfing the web, user interaction and tool usage is evaluated, and an updated profile is calculated. in addition, the system also hosts an additional optional user tracking component that creates an estimation on the users' current focus and detects and understands the situation the user faces (e.g. attention, stress, and confusion) at the moment. by this the additional feedback is created whether • the user needs help for a part of the content, • the help applied by the framework is accepted by the user • the user has problems with the user interface or the user interaction required to trigger help of the framework user tracking combines different sensors that feed into a software reasoner to calculate this estimation. currently an eye-tracker that tracks the focus of the user on the web-page is used to detect cognitive load. additionally, a smartwatch that detects heartbeats and heart rate variability is utilized to detect stress. based on this sensor data and the user interaction on the web-page, every hour the matchmaking component is triggered with the updated profile, resulting in a new dynamic configuration. based on this a recommendation to add or remove functionality is triggered and presented via a dialog to the user. figure shows such a recommendation dialog. if the user accepts the dialog, the updated profile is saved and the tool is added into the current user interface. on the other hand, if the user rejects the recommendation, the changes to the profile are reverted. user approval of any changes is of utmost importance, as consistency of user interfaces and user interaction must be preserved. currently the system is able to make recommendations for different tools to simplify web content as well as for different user interfaces that the framework provides. in the future recommendations on changing user interaction to trigger tools and displaying help provided by the framework will be implemented. due to the covid- outbreak large scale user tests were not possible. preliminary tests with end users showed that a purely adaptive user interface without any user approval is not appropriate for end users. on the other hand, most users were able to understand the current implementation with user approval. once the covid- situation allows it, more exhaustive user tests are planned. system adaptivity and the modelling of stereotypes evaluating the utility and usability of an adaptive hypermedia system benefits and costs of adaptive user interfaces from adaptive hypermedia to the adaptive web a design patterns approach to adaptive user interfaces for users with special needs a knowledge-based approach to user interface adaptation from preferences and for special needs myui: generating accessible user interfaces from multimodal design patterns wcag . -web content accessibility guidelines cloud all priority applications and user profile ontology (d . ). public deliverable of the cloud all project user control in adaptive user interfaces for accessibility monitoring for accessibility and university websites: meeting the needs of people with disabilities integration of a regular application into a user interface adaptation engine in the myui project requirements for the successful market adoption of adaptive user interfaces for accessibility global public inclusive infrastructure (gpii) -personalisierte benutzerschnittstellen. i-com ), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the creative commons licence and indicate if changes were made. the images or other third party material in this chapter are included in the chapter's creative commons licence, unless indicated otherwise in a credit line to the material. if material is not included in the chapter's creative commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use acknowledgments. this project has received funding from the european research council (erc) under the european union's horizon research and innovation programme (grant agreement no. ). key: cord- - i n m authors: okoshi, tadashi; sasaki, wataru; kawane, hiroshi; tsubouchi, kota title: nationalmood: large-scale estimation of people's mood from web search query and mobile sensor data date: - - journal: nan doi: nan sha: doc_id: cord_uid: i n m the ability to estimate current affective statuses of web users has considerable potential towards the realization of user-centric opportune services. however, determining the type of data to be used for such estimation as well as collecting the ground truth of such affective statuses are difficult in the real world situation. we propose a novel way of such estimation based on a combinational use of user's web search queries and mobile sensor data. our large-scale data analysis with about , , users and recent advertisement log revealed ( ) the existence of certain class of advertisement to which mood-status-based delivery would be significantly effective, ( ) that our"national mood score"shows the ups and downs of people's moods in covid- pandemic that inversely correlated to the number of patients, as well as the weekly mood rhythm of people. the ability to determine current affective statuses of users has considerable potential to enable the provision of user-centric opportune services tailored to specific user statuses. web services, for example, can be improved by adapting various types of parameters such as permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. copyrights for third-party components of this work must be honored. for all other uses, contact the owner/author(s). , , © copyright held by the owner/author(s). https://doi.org/xxx the presentation timings, presentation tone, content, as well as content modality. when alice has an emotionally negative status, the news web service can highlight some interesting news or present advertisements that can possibly cheer her up, thus attempting to align itself with her emotions. however, determining affective statuses of web users outside a controlled in-lab configuration, particularly in real-world situations, is a difficult task. the first problem is on the type of data from which the affective status of a user can be estimated. typically, sensing and determining the emotional state of a person require psycho-physiological data such as heart rate (hr) [ ] , hrv, electrocardiogram (ecg), and electroencephalogram (eeg) data [ , ] . however, the collection of such data in real-world conditions of mobile web users is not feasible owing to the low penetration rate of such sensors in the society, additional burden on users to use such devices, and lack of social acceptance to the collection of such data. the second problem is on how the ground truth label on the affective status of a user can be collected. user annotation is widely used during the data-collection phase. however, this approach is also not always effective owing to multiple possible causes such as the following: ( ) the users may find it cumbersome to answer repeated questions, and ( ) the users may forget to answer the questionnaire. in this paper, as the first contribution, we show that we can estimate the web users' affective status (concretely, "mood") in such a condition, based on a novel combinational use of their web search queries and mobile sensor data. to address the first problem, we target users' queries input to the web search engine as an easy-tocollect and noninvasive proxy feature to explain their mood states, focusing on the fact that almost all the internet users regularly use search engines in their daily lives. because web services typically store the historical log of users' search queries at the server side, its use can be started today without having to wait for the widespread adoption of new types of sensors. to address the second problem, we use a novel two-step mood classification with different types of models, namely the "sensor mood model (smm)" and "search-query mood model (qmm). " figure illustrates our research structure. ( ) first, we conduct a preliminary data-collection study with participants for days to collect their continuous sensor data from their smartphones; periodic subjective evaluation of their mood as ground truth annotation was also performed. ( "smm" that estimates the participant's mood statuses from specific temporal frames in which both sensor data and the user annotation were successfully collected. with the built ssm, we can estimate each participant's mood status for all the time frames during the data-collection study period. then, by combining the web search logs of the participants during the study period and mood status (based on both the users' original annotation and smm's outputs), we create our second model "qmm", which estimates the mood of a user from their search query data. as our second contribution, we also show that the use of these mood statuses in a web service (with more than million users) through the introduction of multiple evaluation parameters, including "mood-effective advertisement" and "national mood, " has significant potential. to the best of our knowledge, this research is the first to reveal this possibility. first, by analyzing the existing server-side log data stored in our web service, we investigate the relationship between multiple advertisement contents that we display to the users and the responses of the users based on their derived moods. by analyzing the recent logs of advertisement projects, including when and who viewed the advertisements and whether they clicked on them, we identified the advertisement for which mood-based delivery will be effective. we determined a certain class of advertisements where the mood tendencies of users who clicked was more positive or negative than a random distribution (section ). second, we examined the value of calculating the mood score in nation-wide by using the proposed method in this research. we calculated "the daily national mood score", the average mood score of about , , users in japan on a daily basis, and saw how the score changes over time. interestingly, we found that the score based on our proposed algorithm shows the weekly rhythm of people's mood (that drops every st working day of the week and increases again every weekend) (section . ) and the longer-term trace of people's mood in the covid- pandemic period in year that inversely correlated to the number of covid- patients (section . ). the remainder of this paper is organized as follows. section discusses related work. section describes our novel yahoo! japan emotion framework. section details data-collection study conducted for days with users. section describes model building from the collected data. section shows the results on "mood-effective advertisement" analysis. section shows our findings from our "national mood score" analysis. section discusses our further work. section concludes this paper. extensive studies on emotion began to be conducted in the th century, with a well-known study conducted by darwin [ ] . darwin says that emotion is a product of evolution and that emotions induce actions favorable to survival [ ] . numerous emotional modalities and their respective physiological responses have been studied [ , ] . emotional states are known to affect cognitive and athletic abilities and are reported to affect both human-human and human-machine interactions. picard generalized this research field as "affective computing" [ ] . many studies and systems have been proposed to detect and utilize the emotions of users [ , , , , ] in this field. several methods of determining user emotion have been proposed, which focus on physical characteristics [ , ] , text data [ , ] . in our research, we focus on the mood status of users. mood is related but different from emotion in several aspects [ ] . mood is usually tend to last longer than emotion and is usually a cumulative reaction while emotion is a more spontaneous reaction or emotion caused by a particular event. in recent years, various research have been conducted to estimate the mood state of the user by focusing on the sensors on smartphones. smartphones are equipped with multiple sensors and can collect a wide variety of data, such as acceleration, rotation, location, and network connectivity. also, smartphone can now be considered indispensable to our lives with, for example, more than % penetration rate in japan. in addition, smartphonebased sensing do not require additional devices in sensing, such as those for measuring heart rate, eeg or ecg, that may have additional burden on the users. owing to the rapid development and spread of smartphones, various studies have been conducted on the recognition and estimation of the mood status of a smartphone user. most studies have constructed a classification model that determines moods from the user's contextual data obtained from the smartphone sensor data and self-reporting annotation by the user [ ] . moodscope [ ] investigated the effects of the user context on the mood of a user based on the smartphone sensor data. in addition to emotion and mood, various types of internal statuses of the users, such as "interruptibility" [ , ] , have been recognized and estimated from the smartphone data. other types of sensing modality for emotion estimation is facial expressions in the image data. such research has been widely conducted [ ] , mainly by using facial action coding system [ ] . and some research on the smartphone [ ] platform have been also performed recently. in contrast to those previous works, our research highlights ( ) its novelty in the combined use of smartphone sensor data and web focusing on the data on the web, researchers are currently working on estimating the emotional states of users by analyzing text data on social networks such as twitter [ , ] and facebook [ , ] . these social network text data may contain sentences containing the affective information of users. in this research, we focus on the web search queries to estimate the users' mood status since they are easy-to-collect noninvasive type of data. in reality, it is difficult to estimate the affective state directly by using such queries since most of them comprise a few words and nouns. again, our novel approach focuses on a combination with different types of mood estimation models from smartphone sensors and search queries. figure shows the conceptual view "yahoo! japan affective service framework" being developed on yahoo! japan web service. yahoo! japan has a widely-known and widely-used web site in japanese market, with more than yahoo! japan -branded services such as "search", "news", "shopping", "auction", "movie" and "weather" and with the total number of million registered users. (note that japan's national population is about million.) in addition to the conventional web pages optimized both for pc and mobile devices, yahoo! japan has its own smartphone application both on ios and android platform. more than million users are using these applications, making them one of the most popular smartphone applications in this country. our view around yahoo! japan affective service framework is as follows. at the server side, each of more than yahoo! japan services are logging users' usage including page view and its duration, clicks, and various types of inputs including web search keywords as the most major example of such kind. in addition, at the users' client side, our smartphone application can opportunistically collect various types of sensor data on the users' smartphone, given the users' permission. hence, with such multiple types of input data both from the user's client and server sides, we can opportunistically estimates the user's affective status by using the towards the realization of such framework above, firstly collecting data and the user's subjective mood evaluation from the real-world is inevitable. thus, we first conducted a data-collection study with users for days. we collected continuous data from various types of smartphone sensors as well as the user's subjective mood evaluation (up to times a day) as the ground truth label. for the study, participants were recruited through an external agency. the recruitment criteria were as follows: ( ) the age should be in the range of - years, ( ) must own an active yahoo! japan registered account, ( ) must have the ability to use the yahoo! japan search functionality once or more times per week and should have performed a search at least once in the last month, ( ) must own and use an apple ios smartphone as a private primary phone in their daily life, and ( ) must be a user of an iphone or later and ios version or above. the participants were informed that this study was "an experiment about your condition" during the recruitment process. a total of users, consisting of university students, staff members, and research engineers aged between and years (average: . ), were recruited. the study was conducted for days, from november , to january , . we developed a dedicated smartphone application in this study, as illustrated in figure . the application was developed for the ios platform owing to several reasons. first, the market share of ios is higher than that of android in the japanese market; thus, the recruitment of participants is easier. second, the number of iphone models (such as iphone , plus, , plus, x, and ) of apple is lower than that of android phones (hundreds of models by dozens of manufactures with different os-level optimization in power management, sensing, etc.). thus, we can easily test the application with such phone models to achieve higher execution stability. third, thanks to the ios aware framework [ , ] , we can implement and deploy an application that can continuously collect various sensor data in spite of the fact that ios is a rather strict environment as a sensing platform than android. once the application is installed on the smartphone of a user, it continuously collects multiple types of data from the embedded sensors of the phone, as detailed in table , and periodically uploads the data to the server. the application can also issue a notification (as shown on the left-hand side of figure ) at the timings configured by the developer. once the user responds to the notification, the application opens a questionnaire, where the user can report his current mood status on a -level likert scale ( . strongly negative, . negative, . moderately negative, . neutral, . moderately positive, . positive, and . strongly positive). our experimental procedure consisted of the following three parts: ( ) each participant had a meeting with a study researcher at the beginning of the study and received basic information and instructions about the study, which was followed by the signing of a consent form. next, the participants were asked to install and launch our software on their smartphones. they were asked to grant the following permissions to the application: ( ) mic, ( ) push notification, ( ) motion and fitness activity, and ( ) location (configured as "always") data sensing feature of the ios platform. ( ) after the initial meeting, the -day study period started. during this period, a push notification appeared six times (at : am, : am, noon, : pm, : pm, and : pm) every day. each participant was asked to proceed with the survey within hours after the delivery of each notification. when the participant opened a notification, the application screen ( figure ) appeared and asked the affective mood of the user on the -level likert scale. the participant selected the status, and after a confirmation prompt, the answer was submitted to the server. we created an instant point reward system in the application. each participant scored , , , or points for - , , , or answers, respectively, in a day. the earned reward points were accumulated throughout the study period. when a participant reached the configured minimum total reward points, i.e., points (by answering answers every day for days or answers every day for days), they received a payment. they received an additional payment when they exceeded points. this section describes our model building of two different models, search mood model (smm) and search-query mood model (qmm) respectively. as introduced and illustrated in figure , smm classifies user's mood from a set of features computed from sensor data. using the built smm model, having more training data, we build qmm that estimates user's mood from search queries. for building search mood model (smm), we follow an approach with time-frame-based feature extraction of the time-series sensor data and their classification which are widely used in activity recognition area [ ] . first, we extracted features of each hour time window from the collected raw sensor data obtained from our study with users for days. the features were extracted for each sensor type. the types of extracted feature are different depending on the sensor types. table summarizes the number of features extracted along with several representative feature types. building. then we constructed a supervised machine learning model from frames base on the extracted feature vector and a self-reported mood status as its ground truth label. in this model building, we treat the self-reported mood status as a threeclass classification problem. the collected mood answers (originally in the -level likert scale) were assigned to different labels, − for "strongly negative", "negative", and "moderately negative", for "neutral", and + for "strongly positive", "positive" and "moderately positive." we chose random forest [ ] for the machine learning algorithm which revealed the best classification performance compared with others. performance. according to a -fold crossvalidation for the built model, the model performs an accuracy of . %. table shows the results of the overall performance evaluation. qmm is a model that examines the relationship between a user's web search query and the user's mood score during the search behavior. after its training, it classifies the user's mood score from their search query data. here, there can be two different types of mood score, namely (a) the scores answered by the data collection participants with questionnaire and (b) the scores estimated by smm based on the collected sensor data. in this section, we explain the concept of qmm and how additional use of (b) increases the performance of qmm. for each fixed time range (what we call "session"), qmm gets trained from the data of a user's mood score and search behavior during the session. to investigate the validity of the trained qmms qualitatively, we employed logistic regression, which is a typical example of a "white box" model with high model interpretability, as a model. note that it is not necessary to specify the logistic regression as training scheme in actual operation; we believe that non-linear svms and decision tree-based regressions such as xgboost (that are specialized for performance) are also effective. one session was defined as one record, and training was performed in the following regression equation. where is the mood score and is the learned weight. is the search query assigned to a feature, with if it was searched in that session and if it was not. indicates only whether the query is searched or not, not considering the number of searches. in this training, sessions that do not have both "search behavior" and "mood score" data will be treated as "missing data". therefore, mood scores based on the users' raw survey response data have the challenge in their limited number of responses available for training. in such a situation, our smm model, along with collected sensor data, is an effective means of increasing the number of mood scores to be used for the training. since the sensor is always on as long as the smartphone (with our application) is on, smm can be used to estimate a user's mood virtually for / . we use a session of hour long. the search queries retrieved in the three separated hours were used as features. since smm-based mood scores are available for hours a day, logically the length of each session can be more short (fine-grained). however, if the length is too short, there is a risk that the questionnaire-based mood scores of the comparison method may become too sparse to be learned. hence, the sessions needed to be reasonably long. from such discussion, we decided to use the session length of three hours in this study. we built qmm models, one trained only from the questionnaire answer data, and another one with additional training data based on the smm outputs. for both models, the condition of the performance evaluation is as follows. the data were randomly divided into % training data and % evaluation data. considering the effect of randomness, the evaluations were conducted for times. the training data were balanced so that the amount of positive and negative data was the same before training. for the evaluation data, we did not perform balancing. the results are shown in table , which clearly confirms the effectiveness of smm. compared to our baseline qmm without smm use, the accuracy increases from % to % in case with additional data brought by smm. the table also shows that the number of training data has been more than doubled by smm, indicating that the more than doubling of the dataset used for training by smm has contributed greatly to the significant improvement in prediction accuracy. from these results we conclude to adopt a qmm trained with smm in this research, and go on to our evaluation experiments. to investigate how our mood model works effectively, we evaluated it with our past web advertisement business cases. we conducted an experiment on advertisement cases that were actually delivered to our users in the past. we examined the impression (the number of times the advertisement was exposed to users) and click logs for those ads, and tested offline to see if there were ads that made a difference in whether users clicked or not, depending on their mood state just before their click. the targeted dataset consisted of ads served through yahoo! japan 's ad service. to ensure sufficient amount of impression and click volume for the data analysis, the ads were selected from historical ad business data stored in our servers as the most recent projects with a delivery record of at least days. the targeted ads had at least about billion user impression (view) and about million clicks during the day period in total. user ids and timestamps for impressions and clicks on ads are respectively stored in our server's internal storage. therefore, through the user id in each impression or click log, we can link the user's history of organic search behavior. through think link, for each user, their three-hourly mood score can be calculated by using our qmm model. finally, the results of this calculation can be used to analyze whether or not a user in a certain mood state clicked on an ad when they viewed it. in this study, a pairwise method was used to investigate whether there are ads that change the click state of an ad depending on the user's mood score. first, we organized the logs for the days the ads were served and converted the data into a format of "timestamp, the viewer's user id, if clicked/or not. " then, for each day, we randomly select two records from all the records. if only one of the two extracted records is a "clicked log", we compare the mood scores of the two records each other. we mark "positive" when the clicked user's mood score is higher, and vice versa. this pairwise extraction process was performed million times for each day of the log, to determine whether each ad was more likely to be clicked on when the user's mood score was positive or negative. . . existence of "mood-effective ads". we show the result in figure . the x-axis is the number of days "positive" wins out of days, while the y-axis shows the percentage of such advertisements (among ads). two lines are depicted in the figure; they are ( ) the pairwise comparison result based on the actual estimated mood scores and, and ( ) a theoretical line that would be drawn if we assume that the winner in each pairwise comparison attempt was completely random ( % - %). firstly, as we can understand easily, the shapes of the graphs ( ) random theoretical values have a convex shape that expands toward the center. in other words, the line ( ) can be said to be the result of a hypothetical experiment of consecutive coin tosses. thus, obviously consecutive win (or lose) out of tries is an very rare event. (the probability of a positive winning in all days is about . % when we randomly assign a score. this is the probability of out of every , ad serving.) similarly, the same kind probably with wins (or losses) is about . % ( out of , ) . however, a clear difference can be observed when looking at the actual data ( ). the number of ads where the positive wins on all days or loses on all days can be found in about % ( out of ) of the ads, if they would be served by the mood score. furthermore, the results with "more than wins or losses" scored ( %). we also can observe that the shape of the line ( ) convexity is spread out to look like it has been crushed to the side. from these results, we can confirm that there are indeed certain advertisements that are effective for delivery based on the recipients' mood scores. for the number of trials " million", we tried several different numbers and confirmed that the variability was sufficiently small with this number of iteration. figure shows the standard deviation scores for the positive/negative judgment of the ads in case of different number of trials in the pairwise comparison method. ads were arbitrarily selected, the metric was applied, and the standard deviation of the scores of the positive/negative ratio was obtained. as shown in the figure, the score obtained after one million trials has . , which means that the result may increase or decrease by about %. on the other hand, with million trials, the standard deviation was about . %, which means that the score does not change by more than that number of trials. at this level, we determined that the score would be statistically reliable. we further examined in detail the ads that were more likely to be clicked on in the same emotional state for more than out of days. although we cannot provide examples of actual advertisements delivered due to the confidentiality of the business-related information, we found that the ads that were most likely to be clicked on when the user was in a good mood were those that contained the keywords "free" and "deals," and that many users were considering purchasing a product or service in the future. on the other hand, the common denominator of advertisements that were more likely to be clicked on when users were in a bad mood was the content of the advertisements to relieve users of their complaints. it is very interesting to note that there were clear differences between the ad groups that were more likely to be clicked on in good and bad moods. the purpose of this experiment is to examine the value of calculating the mood score in nation-wide by using the proposed method. regarding the registered users of yahoo! japan , our internal data on their demographics shows that almost all of them are geographically located in japan. therefore, by calculating and averaging the mood scores of all those users on a given day, we can derive a value that we call "the national mood score" of japan on a daily basis. in this evaluation, for each day of the given dataset, we ( ) computed (individual) daily mood scores of approximately , , yahoo! japan search users from their search query logs previously acquired and stored, and ( ) calculated the average of them. we name this average score as "daily national mood score". then, we ( ) examined the changes of this score over time. note that the exact number of the users for the calculation changed day by day, since we targeted yahoo! japan registered users who used our search engine for at least once, for each particular day. for our types of evaluation goals, we used two different datasets with different periods and duration, as we present in section . and . . as described earlier, one of the challenges in this research is to show the effectiveness of smm when used inside the qmm training. thus, we compare ( ) qmm with smm as our proposed method and ( ) qmm without smm as a comparative method in this evaluation. note that all the conditions (algorithm, hyper parameters, split ratio between the data) are the same between these two methods. the first case is an analysis of the trend of daily mood score for four weeks. we want to see how national mood score changes within a month, relatively a short term. for such analysis, we used a dataset for the period from july , to july , . we carefully chose this period from the stored log to avoid including major news that could have a significant impact on many users. figure shows the resulted national mood scores for this period. the x-axis represents the date while the y-axis represents the daily score. a very interesting result we can read in the figure is that the scores tend to be clearly positive on weekends and more negative on mondays when the workday begins (or tuesdays when monday is a holiday on july ). national mood score date (july ) week end week end week end week end national holiday (monday) national mood score proposed method comparative method figure : daily national mood score for weekdays and weekends although no ground truth data exists on nation-wide mood, this tendency to feel better later in the week and then worse again on monday is considered to be a visceral result in a society where many people work monday through friday, as there is the term "blue monday." in fact, we have some facts that can explain this. according to a white paper by the ministry of health, labour and welfare [ ] , the highest number of suicides are on mondays in japan. another survey of men and women, conducted by a company, found that the highest number of respondents in all age groups said they felt most depressed on mondays . on the other hand, it's easy to imagine the mood being more positive on weekends and holidays. the proposed method looks successfully expressing the rhythm of the mood change over the weekdays and weekends in an instructive manner, while such rhythm is not very clear in the comparative method. in particular, the proposed method is clearly showing the depression on monday (or tuesday if monday was a national holiday). from this result, we discuss that the proposed method is better explaining the weekly mood rhythm than the comparative method. our second evaluation aims to reveal how national mood score changes in the covid- pandemic situation in . in this case, we looked at the change in the daily national mood scores every sunday from the beginning of the year to the end of july, on two different years and . (the most recent stored historical data for such a long term was the data on year . due to some internal infrastructural change, we could not retrieve the equivalent data for year .) we chose sundays since every sunday is a holiday. on the other hand, in the case of other days of the week, it was assumed that the analysis results would be difficult to discuss due to occasional holidays. on such dataset, we firstly calculated the national mood score for each day of the period. then, for each year, all the scores were normalized to the score of the first sunday of the year respectively. the result is illustrated in figure . the x-axis shows the date. the lines represent the daily national mood scores. the value in y-axis is "relative" compared to the score of the same day in year . a value greater than indicates a higher score in the same period in than in , and vice versa. the bar graph shows the number of covid- participants in japan in [ ] . surprisingly, the results show that a peak of the waves of covid- infection spread and degradation of the national mood score are synchronized. the peak of the covid- first wave was april (the number of the patients: , ). at exactly the same day, the mood score decreased down to the bottom . . the peak of the second wave was august (the number of the patients: , ). the numbers around the second bottom of the score were . (august ) and . (august ). on the other hand, we can not confirm such clear tendency in the comparative method. it is very interesting to read the tendency for mood to become more negative as the number of covid- patients increases, and more positive as the number of patients begins to decrease. in , japan had originally planned to host the tokyo olympics and paralympics. the mood in japan was positive in the beginning of the year. however, covid- began to rage, the olympics were postponed, and various economic activities were restricted. the number of corona cases began to rise in japan, and the public was frightened of it. then, as the first wave subsided, the mood was positive for the restoration of economic activity again. however, when the second wave starts to occur, the mood goes negative again. this graph looks to have successfully tracked the tumultuous changes in japanese people's mood in . the most remarkable point on our model (with the proposed model) is that the period of time when we collected sensor data and search queries for the model building was from october to december of , which is before the covid- pandemic. this means that there is no possibility that the search queries in the trained model contain rules, such as "a query covid- is a negative feature for mood estimation". again, there is no ground truth on "national mood". however, from the fact that we can nevertheless observe the score trend inversely correlated with the number of covid- patients, we conclude that the national mood score with the proposed method matches our intuition more. the next step in this study is the evaluation of the user's sentiment based on real services. our plan is to develop an actual service that can differentiate advertisements and recommendations based on mood scores. however, there are three problems with this approach. first, we need a method for estimating which advertisements are relevant to what type of mood in advance. once we build such a methodology, it will be possible to evaluate the actual advertisements. another task is to improve the model performance. in this study, we adopted a white-box model to examine the effectiveness of the model qualitatively. we hope to employ models focused on precision and recall performance towards further performance improvement when we conduct real-world tests. finally, classification other affective statuses (beyond mood) should be possible. we expect that the same framework can be used to model a wide range of affective statuses. our future work includes building such models and extensive evaluation of them on our services. affection-awareness is one of the key components in the humancentric information service. however, particularly in the real-world web field, estimating such statuses of the user is yet to be realized. we proposed a novel way estimation methodology of web user's mood based a combinational use of their search queries and mobile sensor data. our extensive data analysis revealed multiple interesting results, including the existence of advertisements to which mood-status-based delivery would be significantly effective, and the changes of national mood scores in the weekly rhythm and in the covid- pandemic situation. activity recognition from user-annotated acceleration data distinctions between emotion and mood modeling public mood and emotion: twitter sentiment and socio-economic phenomena random forests how's my mood and stress?: an efficient speech analysis library for unobtrusive monitoring on mobile phones how do you feel? interoception: the sense of the physiological condition of the body the expression of the emotions in man and animals real-time facial feature extraction and emotion recognition joint conference of the fourth international conference on darwin and facial expression: a century of research in review what the face reveals: basic and applied studies of spontaneous expression using the facial action coding system (facs) automatic facial expression analysis: a survey aware: mobile context instrumentation framework projective testing of diurnal collective emotion towards estimating computer users' mood from interaction behaviour with keyboard and mouse the spread of emotion via facebook experimental evidence of massive-scale emotional contagion through social networks emotion interaction system for a service robot moodscope: building a mood sensor from smartphone usage patterns daily mood assessment based on mobile phone sensing white paper on suicide prevention in japan. ministry of health labour and welfare emotions evoked by common words and phrases: using mechanical turk to create an emotion lexicon measurement and analysis methods of heart rate and respiration for use in applied environments ios crowd-sensing won't hurt a bit!: aware framework and sustainable study guideline for ios platform real-world product deployment of adaptive push notification scheduling on smartphones interruptme : designing intelligent prompting mechanisms for pervasive applications emotional responses to pleasant and unpleasant olfactory, visual, and auditory stimuli: a positron emission tomography study evaluation of mental workload with a combined measure based on physiological indices during a dual task of track ing and mental arithmetic frustrating the user on purpose: a step toward building an affective computer real-time facial expression recognition on smartphones harnessing twitter" big data" for automatic emotion identification an analysis of mental workload in pilots during flight using multiple psychophysiological measures corporation . summary of new corona virus infections. yahoo! japan corporation key: cord- -utqyuy n authors: zamani, efpraxia d.; pouloudi, nancy; giaglis, george m.; wareham, jonathan title: appropriating information technology artefacts through trial and error: the case of the tablet date: - - journal: inf syst front doi: . /s - - - sha: doc_id: cord_uid: utqyuy n the concept of appropriation is of paramount importance for the lasting use of an information technology (it) artefact following its initial adoption, and therefore its success. however, quite often, users’ original expectations are negatively disconfirmed, and instead of appropriating the it artefact, they discontinue its use. in this study we examine the use of it artefacts following negative disconfirmation and use grounded theory method techniques to analyse blogposts, collected between march – july , to investigate how users appropriate or reject the tablet when technology falls short of users’ expectations. our findings show that users overcome negative disconfirmation through a trial and error process. in doing so, we identify that users appropriate the tablet when the attained benefits significantly outweigh the risks or sacrifices stemming out of its use. we discuss our contribution within the context of the appropriation literature, and highlight that the success of it lies with the user’s success in identifying personal use scenarios within and across diverse contexts of use. contemporary information technology (it) devices are boundary spanning and accommodate different contexts of use, covering both professional and personal use scenarios. such it devices can be smartphones and tablets, among others, and their adoption and use is largely volitional (schmitz et al. ) , which means that the individual user is able to decide and control their use. this indicates a great heterogeneity of potential uses, and further signifies increased user control over a device's adoption, modification, and even rejection. these two points create an important challenge where the user can easily switch between contexts, with it having to satisfy their requirements irrespective of the changing environments. this is important because the success of an it artefact resides with the user identifying a benefit in it use against the background of personal use scenarios. for this reason, today there is a large body of research that examines why users accept an it artefact and how they make use of it (barnett et al. ) . often, these studies draw from theories, such as the technology acceptance model (davis and warshaw ) , the unified theory of acceptance and use of technology (venkatesh et al. ) and their variations (kim and garrison ; e.g., venkatesh et al. ; venkatesh and davis ) . these theories place the emphasis on the factors that drive user decision with regard to the acceptance or rejection of the technology (aggarwal et al. ) . however, because these theories place the emphasis on the preliminary stages of user interaction, less or no emphasis is inescapably given in how users actually make use of the technology, which exerts a greater impact on its viability (venkatesh et al. ) . the aspect of how users make use of it is typically examined by post-adoption user behaviour studies (zamani et al. ) , and often through the lens of appropriation which explains how users adapt and modify the it or even refine their it use in order to achieve their goals (clark ) . studies often focus on the fit among user, task and it (e.g., barki et al. ) or the variations of appropriation acts and adaptations (e.g., pallud and elie-dit-cosaque ) . however, existing research has two main shortcomings. first, there is a key assumption that a fit can be achieved via appropriation. second, research mainly focuses on organisational systems within enterprise-focused contexts (e.g., a. bhattacherjee and premkumar ; jasperson et al. ) . even when the unit of analysis is set at the individual level, the user is approached as an organizational member (e.g., ahuja and thatcher ) . as a result, appropriation studies often miss the particularities of it where appropriation is not necessarily side-stepping top-down imposed use patterns but equally an effort to satisfy user goals. in this paper, we are interested in seeing "what is the process of appropriation of it devices by individual and volitional users following negative disconfirmation?" this is of distinct importance. disconfirmation denotes the discrepancy between the user's original expectations and their perceptions regarding the actual performance of the it device post-usage (a. bhattacherjee and premkumar ) . positive disconfirmation suggests that the user is pleasantly surprised but negative disconfirmation is crucial as it often leads to discontinuance behaviour and the abandonment of the it device. it is thus imperative to understand how, rather than abandoning the it device, the user can be successful and incorporate it successfully, within their portfolio of other it devices. appropriation signifies the situation where the user has managed to overcome the said discrepancies, and made the it device their own, using adaptations and modifications (a. k. barrett ; clark ) . most importantly, appropriation leads to habitual norms and routines (dennis et al. ) , and as such, to the lasting use of the it device (wu et al. ) , which is what makes an it device fairly successful. we use the tablet as an exemplary case, and specifically the ipad. we chose to focus on the ipad because since its launch, it has been consistently popular among the mass consumer market, and because it has attracted the attention of both practitioners and end users (zamani et al. ) . we use grounded theory method (gtm) techniques specifically because our aim is to develop a theory grounded on the data, using systematic ways for data collection and analysis, while taking stock of prior research and with the objective to enrich the appropriation literature. our empirical material consists of blog posts, covering years. we analysed this material following the approach put forward by volkoff et al. ( ) by drawing iteratively from the relevant literature and our data, and examining the empirical material through the lenses of competing theories. this resulted in the identification of behavioural patterns that are compatible with the concept of trial and error. our contributions are twofold. first, we introduce trial and error as a new variation of appropriation that addresses some of the observed shortcomings of the existing variations. second, we study user behaviour with contemporary, highly popular it devices. as their use is not necessarily organizationally mandated, their success relies on the users identifying personal use scenario(s) that can serve them within and across diverse contexts of use. against this background, understanding the processes users go through while moving from negative disconfirmation to appropriation is of crucial importance for the success of the examined it devices within these diverse contexts. in what follows, we first discuss prior work on appropriation. we then present the methods used and our findings. then, we discuss our findings in relation to the existing literature and theorise on trial and error. we conclude our paper with the contributions of our research and its limitations. appropriation is a prerequisite for the sustained and lasting use of it systems as it feeds into the formulation of norms and routines (dennis et al. ) by supporting users in developing personal use scenarios (mäkelä and vellonen ; wu et al. ). as such, appropriation is core for the success of it artefacts. in the next section we examine the different variations of appropriation. clark ( ) defines appropriation as the "situation where the user starts by recognizing the potential value of a particular it and manages to narrow the absorption gap between the requirements of the it and its own limited capacities" (p. ). similarly, carroll and fidock ( ) describe the concept as "seeking a relationship with the technology so that it provides benefit to the user through supporting practices, enabling new -and beneficial -practices or removing ineffective practices" (p. ). existing research emphasizes the impact of appropriation on performance (e.g., beaudry and pinsonneault ; desanctis and poole ) with several studies analysing its different variations, ranging from workarounds (e.g., alter ) and adaptations (e.g., elie-dit-cosaque and pallud ) to improvisations (e.g., mcgann and lyytinen ) . by far, the most well research variations are those of adaptations and workarounds. according to beaudry and pinsonneault ( ) , adaptation guarantees a fit between technology and user, as a result of the user changing routines and habits, enriching their skillset, and changing the technology with the aim to achieve their goals. schmitz et al. ( ) have further extended this concept through the lens of adaptive structuration theory and discuss that, as a result of the technology's malleability, users adapt the technology and their tasks either subtly and progressively (exploitive adaptation) or more exploratively, by reinterpreting the technology and its features (explorative adaptation). the explorative type of adaptation is quite similar to the enhanced use of it, proposed by bagayogo et al. ( ) , where the user attempts to find new ways of using features of an it system. this may include using previously unused or underutilised features (bagayogo et al. ) . the exploitive type of adaptation strongly resembles the deep structure usage, proposed by desanctis & poole (desanctis and poole ), whereby both approaches focus on the extent and the intensity of the use of it features for task completion, and the user efforts to progressively use more and more it features and functions. while in all the aforementioned approaches there is a common element of learning (kwahk et al. ) where users recognise and put in use new approaches to task completion (barki et al. ) , there are differences as well. namely, enhanced use of it emphasizes that it use may change over time (bagayogo et al. ) , whereas schmitz et al. ( ) and beaudry and pinsonneault ( ) focus more on the identification of a fit among task and technology or technology and user. on the other hand, deep structure usage is a rather more demanding form of usage (tams et al. ) , as it entails that users comprehend how the system is structured (burton-jones and straub ) . in the effort of identifying a fit and appropriating an it device, it is not used as originally designed (schmitz et al. ) . user modifications may or may not be in the spirit of the original design (desanctis and poole ), and essentially become workarounds (alter ) as the user engages with it outside recommended rules (ferneley and sobreperez ) . workarounds are often seen as a form of resistance behaviour within an organizational setting. yet, numerous studies to date show that, when workarounds become stable over time, they are merely an indication of an inadequately designed information system (ferneley and sobreperez ) , and should be seen as integral for the completion of day-today work (e.g., azad and king ) . as workarounds are often developed for tackling in situ newly emergent shortcomings, they are closely related to revisions of it use (sun ) and improvisation acts (morrison ) . revisions of it use concern primarily the way users revise their it use due to novel and challenging situations when users are faced with discrepancies or other deliberate initiatives (sun ) . such revisions allow users to meet their needs and overcome inadequacies of it and processes (lee et al. ) . within this context, users may try new features, substitute older ones, or combine some and even repurpose them, in their attempt to make better use of existing and new it features, functions and extensions (sun ) . in this sense, revisions of it as an appropriation variation shares a common focus with enhanced it use on how it use may change over time, as it emphasizes that "a person's [technological features in use] is always in flux" (sun , p. ) . such novel uses of it in light of shortcomings, resource shortages and other workplace challenges can equally take the form of improvisation acts (morrison ) . combining, recombing and repurposing it denotes an attempt to try out different things in order to solve problems. in doing so, users may cast a wide net that goes well beyond what they know, which allows them to innovate and identify novel solutions (scheiner et al. ) . while trying to find new ways of using it, users essentially try to innovate (tams et al. ) , and in doing so they make it their own (ahuja and thatcher ) . workplace innovation with it suggests that, users seek to incorporate it in their processes (wu et al. ) while attempting to do away with the restrictions enacted by it itself (schmitz et al. ) . the difference between trying to innovate with it and improvisation is that first places emphasis on one's goals about the outcome of the interaction (ahuja and thatcher ) , while the second entails thinking and acting "simultaneously and on the spur of the moment" (ciborra , p. ) . therefore, while trying to innovate can be seen as one's behavioural coping toward identifying new uses for existing systems and ways to support new tasks (wu et al. ) , improvisation is often seen as unpredictable because users need to work with whatever is available (ciborra ) as a result of e.g., resource shortages (morrison ) . table offers a summary overview of the different appropriation variations presented here, offering additional information with regard to the context of study and the unit of analysis typically adopted by existing studies. we note that, while some studies focus on the individual level, the majority focuses on organizational contexts where it use is mandatory. we further highlight that while the previous discussion has highlighted their differences, the common denominator across all variations is that of the user making the it their own, irrespective of the exact process adopted. in what follows we highlight the current shortcomings of the appropriation literature. first, all variations recognize that how users use it may be different from the originally designed instrumental use. this innovating with it allows users to identify new it uses for task completion. it entails that the system's value can only be realized when it is fully integrated into one's workflow (ahuja and thatcher ) . trying to innovate is goal-directed and can be a team learning behaviour (magni et al. ) and is influenced by one's personal innovativeness (haug et al. ). users revise their it use as a result of emerging situations and discrepancies, with the aim to overcome disconfirmation and appropriate it (sun ) . it revisions may also stem out of the introduction of new it features, functions and extensions that support the system's increased use (bagayogo et al. (ferneley and sobreperez ) . they can generate both stability and fluidity, (re) creating routines (rossi et al. ) improvisation is about thinking and acting "simultaneously and on the spur of the moment" (ciborra , p. ) , with the aim to find viable solutions (ciborra ) . therefore, it requires expertise, competence and intuition (fernandes ) , so that users are able to identify relevant solutions (orlikowski ) . it offers flexibility at the individual level (rossi et al. ) level of analysis organizational members (barki et al. ; beaudry and pinsonneault ) individual ( suggests that users move from 'technology-as-designed' to 'technology-in-use' by engaging with the technology, exploring its potential and modifying, eventually, its features as per their requirements, which is what leads to the appropriation of it (lapointe and beaudry ) . second, all variations are focused on identifying a fit among task, technology and user. this is done either by modifying the technology, the task, user behaviour or any combination of these. the assumption here is that a fit does exist and can be found by the user. however, a user may follow the same processes that are said to lead up to appropriation, without achieving the desired outcome, i.e., appropriation itself, eventually rejecting the it. this possibility is under investigated in the existing literature. third, the extreme majority of studies focuses on enterprise-level systems and/or on organizational contexts (see table ). for example, improvisation studies focus on the improvisational capability of organizations (e.g., chatterjee et al. ) , or the improvisation acts of organizational members during fire-fighting situations (alblas and langerak ; repenning ) . such an emphasis is expected considering that investments in enterprise software are costly (bagayogo et al. ) and reluctance, or even failure, to adopt enterprise it endangers performance and profitability. this assumes that users, while attempting to appropriate the it, will prioritise the requirements of the organization over their own. it also assumes that there is a restricted set of it systems which has been imposed top-down by the organization. as such, even when the unit of analysis is that of the individual, users are treated as organizational members with the assumption that their commitment to make things work may be low. however, individual users under their volitional control, may proceed at their discretion with the appropriation of it; in doing so, they may employ a multitude of adaptations and modifications, if and when required (schmitz et al. ), without constraints imposed by a top-down hierarchy, they may not abandon the process when they identify a good enough solution, or abandon the process altogether should they consider that a tentative solution is either non-existent, or too difficult (swann ) . further, under the user's volitional control, where a top-down description of what constitutes suitable use does not exist (schmitz et al. ) , the acts of adaptation, exploration, and appropriation can be more flexible and indeed lead to rejection without (fear of) sanctions; time and flexibility allow users to discover possibly unexpected usages that can increase their satisfaction with the it artefact. as such, research should focus more on use contexts beyond the typical organizational so as to appreciate the richness of the phenomenon, how it may be adapted when its use is not mandated and users are able to choose from a wider set of action possibilities. we seek to understand "what is the process of appropriation of it devices by individual and volitional users following negative disconfirmation?" this research question entails a broad focus around post adoption use behaviour, but is squarely rooted on the discrepancies users experience between initial experiences and realities in use. this broad focus further allows us to consider the problems users may encounter while attempting to fulfil a task, and the different solutions they may enact with the it device in question, or any other it device at their disposal. to address this question, we designed a qualitative study using grounded theory method (gtm) techniques. gtm is often used for describing process-based phenomena, and offers data collection and analysis that support the emergence of theory from the data, while being guided by existing relevant theories (urquhart ) . we follow a bottom-up approach in line with the gtm paradigm (e.g., m. barrett and walsham ; boudreau and robey ; volkoff et al. ), meaning that trial and error emerged from our preliminary analysis of the collected data, when we observed that users, first, had an expressed interested in using the tablet for various reasons and that failing to use it in an as-is fashion, they were trying out different things. this observation led us to proceed with subsequent data collection in order to identify either similar or dissimilar behavioural patterns, where the second phase of data collection was influenced by the concepts emerging from the first phase (theoretical sampling) (urquhart ) . our gtm study was designed specifically around the tablet and occurrences of negative disconfirmation with it, in order to capture the processes of appropriation following such occurrences. we chose to focus specifically on the ipad. the ipad, as an exemplar case of a tablet is particularly popular among users, offers a fairly consistent user experience across its numerous generations (zamani et al. ) , and thus allows for maximum similarly in the data, which in turns leads to generating and verifying basic properties and conditions for our core constructs (urquhart ) . for our gtm study, we followed the stages of analysis as proposed by glaser (glaser ) and charmaz (charmaz ) of coding and theorising around the open (or initial), selective (or focused) and theoretical codes, writing up memos and theorising around these, too, and finally, integrating and linking up our codes and core categories through a constant comparative method, finally developing our trial and error theory of appropriating. these are discussed in detail in the next two subsections. when we began looking into the tablet and how users use it, we noticed that many bloggers were documenting their experience with the tablet in their personal blogs. within them, the tablet users were offering narratives of their everyday life, sharing their goals and experiences with the tablet as well as detailed accounts of their interaction, the problems they were encountering and the employed strategies towards solving these. we therefore considered safe to assume that these bloggers were able to provide candid descriptions in their blog posts of their everyday interaction with the device, of the it uses they were hoping or were successful in developing with the tablet, and of the actions that allowed them (or not) to appropriate it (hookway ; zamani et al. ) . the empirical material of this study therefore consists of blog posts. blog posts often contain narratives of everyday life, where bloggers share their experiences and their goals with their readership. in our case, the collected blog posts are authored by ipad users who offer detailed accounts of their interaction, the problems they encountered as well as the solutions they implemented towards resolving them. the material was collected in two stages; the first spans the period between march -august , and the second between january -july . the beginning of our study ( ) and our first data collection phase (march -august ) overlapped with when the first tablets were introduced to the mass consumer market (i.e., the ipad). further details on data collection can be found in appendix . during this phase, our preliminary data analysis showed that, while users were keen to identify a fit for the tablet into their everyday, the process was rarely straightforward. instead, users exhibited a highly explorative behaviour, where they were trying different use scenarios, often incorporating the ipad into their device portfolio in novel ways. the use of gtm allowed us to identify negative disconfirmation as a fairly relevant conceptual category for our study, and where appropriation and rejection are outcomes of a trial and error process where the user tries out different things in order to identify solutions to this negative disconfirmation. in light of this, we proceeded with a second phase of data collection (january -july ), following theoretical sampling of the same type. (urquhart ) . our aim was to increase and verify the usefulness of each of the emerging categories and establish the conditions for each. as such, theoretical sampling for the second stage entailed focusing exclusively on sampling blog posts that would help us achieve meaning and content saturation (hennink et al. ) while selectively sampling with the emergent core category in mind (glaser ) . for both data collection staged, we examined the collated blog posts against our inclusion criteria. namely, each blogpost had to: a) contain a rich description of the blogger's interaction with the ipad, b) describe voluntary use of the device for both personal and professional use scenarios, c) contain a description of negative disconfirmation i.e., the user attempted to use the device in a particular way but for some reason failed to do so, and d) describe an underlying effort to overcome disconfirmation. these criteria allowed us to collect material that contains contextual and processual information, supporting us in addressing our research question. the chronological difference between the two stages means that, while during the first stage, the tablet was still a novel device, during the second stage, users had become directly or indirectly familiar with it, and had a clearer idea about their goals and expectations regarding the device as a result of others' experiences and advertising. however, the purpose of our study is not to compare and contrast expectations, where the 'starting point' of each experience would be undoubtedly critical. instead, we are only interested in examining how users attempt to overcome negative disconfirmation through trial and error behaviour and regardless of the specificities of the technical features that prompted the disconfirmation. therefore, we do not consider the different tablet generations as a critical element for our interpretations, precisely because the data reveal a consistent behaviour of trial and error across the different generations of the technology. all in all, the final data pool consists of blog posts, authored by unique english-speaking tablet users (table , appendix ). of them, bloggers are male, are based in north america (usa and canada), and the majority of the remaining blog posts are authored by europe-based bloggers. most users have managerial positions or freelance. we began our analysis with a preliminary examination of our data and moved to open, selective and theoretical coding following the glaserian paradigm (glaser and strauss ) and in line with urquhart's and fernandez's recommendations ( ) . while open coding, we coded, line by line, or at word level, and more rarely full paragraphs, often using in vivo labels and drawing from existing literature (see fig. for some examples). this was done by the first author in consultation with the second author, discussing the relevance of the used labels and the consistency of the coding. an overview of the process of data collection, analysis and interpretation can be found in table . the stage of open coding is critical as it acts as a sensitizing device for data collection and analysis, and for further examination of the existing literature (m. barrett and walsham ) . indeed, during this stage, we noticed that users were recounting some disconfirmation, most often than not, a negative one, with the tablet, following which they begun trying out different solutions in order to overcome it and address the experienced issues. in many instances, this led to further 'errors' in their interaction with the tablet. we moved to selective coding by focusing our coding around the codes that seemed to relate to the emerging categories, while identifying their variants, how they relate to each other (hekkala and urquhart ) , drawing iteratively from the literature, constantly reviewing and revising the evolving coding scheme (wiesche et al. ) . the descriptions of our codes and core categories can be found in table . in this section we present our findings, organized around our three core categories of negative disconfirmation, trial and error and outcomes, in order to illustrate how negative disconfirmation emerges, the process of trial and error behaviour, and the conditions for each of the two main outcomes of this process. inspired by other researchers (e.g., korica and molloy ; vaast and levina ) , we use vignettes. vignettes are often used for the presentation of findings as they constitute concrete examples, which are carefully selected as "illustrations and exemplars of particular concepts" (swan et al. (swan et al. , p. , and in this case serve as a way to provide a rich description of different examples of trial and error behaviour without decoupling these from their familiarisation: review of the empirical material: memoing and note taking, emergence of initial ideas (trial and error) open coding: initial coding line by line, occasionally at word level (cf. table ) selective coding: open codes organised around core/initial ideas (trial and error), grouping codes together, primarily guided by the preliminary research question (cf. table ) . reflection: review of codes and themes. the coding scheme was reviewed by the first two authors to ensure it reflects the emerging themes, that codes are mutually exclusive and that they are exhaustive (miles and huberman ) . negative disconfirmation emerged as particularly prominent. we examined theoretical saturation (not achieved) ➔ a second data collection was decided on the basis of theoretical sampling to achieve it. second data collection jan -jul new data were collected, focusing specifically on negative disconfirmation occurrences, and a) newer generations of ipads, b) volitional contexts of use, where c) the ipad is used both for professional and personal use cases by similar types of users (boundary conditions), with a view to achieve meaning and content saturation of our codes (hennink et al. ). this stage was guided by theoretical sampling (urquhart ) and helped us achieve theoretical saturation. new material was added in the main pool, and open coded line by line, or at word level. selective coding redone: integration of new selective codes, reframing of previous ones, identification of the properties and the components of the core categories (c.f., table ). reflection: the coding scheme was reviewed for consistency by the two first authors. theoretical saturation was examined (no new themes emerging and the theoretical categories were saturated as a result of coding). theoretical coding: reflective elaboration of relationships among categories (glaser ) via constant comparison, using evidence from the data, building on our memos, on the basis of glaser's coding families (glaser (table ) , and integrative diagrams (fig. ) , revisiting the literature and developing findings. "the conventional wisdom on tablets is that they're for consumption not production. you can absorb text quickly and well, for example, but writing is a chore. extract from blogpost "a week with an ipad pro" (b ) "at this point i don't think any working professionals are going to be able to go all-in on the ipad pro as a their daily driver. there are just too many walls and ceilings to bump into right now. however, for a casual user, this device could very well be all that you need." /coded at "too many walls and ceilings to bump into" extract from blogpost "ipad pro review" (b ) fig. examples of open coding trial and error schema goals "skeptical" (e.g., b , b , b ), "what always wanted the iphone to be" (e.g., b ), "primary machine broke down/replacing" (e.g., b , b , b ), "it experience (e.g., b )" users begin interacting with the ipad, having some goal in mind. this goal may be very tangible and specific (e.g., replace a pre-existing it (or not) device) or fuzzy and highly explorative (e.g., explore the potential of the tablet). as such, negative disconfirmation seems to be the result of unachievable goals, and gaps between goals and realities in use. comparing "comparing to paper" (b , b ), "comparing to books" (e.g. b , b ), "comparing to a laptop" (e.g., b , b , b ), "comparing to using it with mouse and keyboard" (e.g., b , b , b ), "none that i've tried work all that well" (b ) in setting their goals, tablet users may be influenced by reviews, advertisements, others' experiences as well as their own past experience with similar or dissimilar devices. negative disconfirmation surfaces when the user is unable to achieve their goal (e.g., they are unable to read an e-book while lying in bed), and when they compare the tablet with other it and non it devices (e.g., comparing the ipad to the kindle and physical books, and the resulting experience). it is noted that comparison is continuous: from the moment the user begins interacting with the tablet all the way to finally appropriating or rejecting it, and during their trial and error efforts. "using a smaller keyboard" (b ), "to record notes during patient interviews, both by typing and with a stylus" (b ), used only while "at a table or another flat service [surface, sic]" (b ) trial and error is what users go through in the face of negative disconfirmation in order to overcome it. they do so by adapting the device e.g., using external add ons), augmenting it by using third-party applications, and even adapting their tasks and workflows. this behaviour is influenced by the user's experience with it, and prior experiences, and this being a tentative solution suggests that a) it is one of the many possibly equivalent solutions towards overcoming negative disconfirmation, and b) it is later reviewed for its applicability and can potentially lead to further problems (errors). error non-tolerable errors: "lack of speed and accuracy" (b ), " [not] easy for me to mix and match my favourite instruments" (b ), "more fatiguing compared to pen and paper" (b ) tolerable error: "there wasn't enough power" (b ), "quasi-mobile device, but it's not recognized as one" errors denote problems stemming as a result of the tentative solutions (being not good enough or raising further problems that prohibit the user from achieving their goals). as a result, disconfirmation persists or intensifies. • there may be non-tolerable errors, where the tentative solution is not good enough or a solution does not exist. • there may be tolerable errors or no errors (the tentative solution does not impede further interaction and use). appropriating "at first, i used simplenote to sync with scriverner. eventually, i found a better solution, using scrivener, dropbox, and elements. this last solution has worked well for me since i discovered it." (b ) appropriation surfaces as the user transitions to new workflows, by adapting their tasks and their behaviour to the tablet's requirements, or equally employing tentative solutions that augment the tablet (e.g., hardware or software add ons) and produce no errors or only tolerable errors. this suggests that the user overcomes negative diconfirmation and achieves their goals, and that the user integrates the tablet into their everyday. rejecting "i gave up and borrowed laptops (one per continent) to do all of my posts, including when i was covering our keynotes at tnw conference. (…) however, in the near future at least, i will haul my laptop on any trip i go on where i'll be blogging" (b ), "i will probably never try reading another book on the ipad again: destroying one of my greatest pleasures with constant discomfort seems like a ridiculous thing to do to myself again." (b ) users reject the tablet because they cannot overcome negative disconfirmation: they continue comparing the new to the old way of completing tasks, and they either deem the tentative solutions as not good enough or the errors as non-tolerable. as a result, they often regress to their old routines. contextual conditions. we chose the particular vignettes because they are rich descriptions of negative disconfirmation, trial and error and outcomes, but also show the variety of contexts within which users tried (and possibly failed) to appropriate the ipad. through the analysis of our findings, we see that initial negative disconfirmation with the tablet surfaces in indicative different contexts of ipad use, as a result of a discrepancy between the user's expectations and goals in using the tablet and the tablet's actual performance. below, we present two vignettes to illustrate how negative disconfirmation surfaces. dale (b ) acquired an ipad with the aim to use it as his sole device while travelling for work-related purposes. he is a frequent flyer and he was motivated to use the ipad as a result of his battery capacity and light-weight format, that could allow him to be more mobile and remain productive for longer while on the go: "the two most important factors were that the ipad has a killer battery ( + hours no matter what i'm doing) and that it is slim and only . pounds. compare this to my + pound and well over one inch thick laptop that gets at best hours of use (on a -cell battery) -basically i bought the ipad in part to be used as my travel laptop replacement for these reasons" (b ). with this in mind, he expected that the ipad could serve him well while covering conferences around the world. however this was not the case. dale explains that, despite of its strong points, the ipad did not perform as expected and that it could not function as the sole device. the ipad did not fit well with his workflow because it did not allow him to complete important tasks as part of his job, which results in his negative disconfirmation and him using other devices in order to keep up with his duties: "i took notes at the dc conference on the ipad, which turned into three posts. however, and here is the main moral to this post -all these posts came at best hours after the sessions because i didn't actually post any of these stories to wordpress using the ipad. (…) so to make a long story short, i gave up and borrowed laptops (one per continent) to do all of my posts, including when i was covering our keynotes at tnw conference." (b ). gordon (b ) purchased the tablet with the aim to explore and experiment with it while recovering after his operation. contrary to dale, gordon does not have a clear goal with "face the congregation at all times" (b ), "wonderful opportunities for "social" internet surfing" (e.g., b , b ), "a screen that connects me with people" (b ), "once you get used it that, you realize how efficient you are with the lack of distraction." (b ) the outcome of appropriation as a result of the trial and behaviour is subject to the user identifying benefits in using the tablet: identifying benefits is a condition for the appropriation of the tablet. some of the benefits is the use of the device together with others, without isolating themselves from their environment, and without hindering their social interactions. identifying benefits allows users to develop their personal use cases, persevere in finding a tentative solution and evaluate overall more favourably the tablet despite initial disconfirmation. feeling restricted "too many walls and ceilings to bump into" (b ), "apple will sit and control what you can do with the advice" (b ), "the size of the device doesn't let much freedom for taking many photos" (b ), "inability to listen to a video in the background" (b ) the outcome of rejection is more likely when users feel as if the tablet restricts them in some way. missing features and functions entail that the user either will have to work around them (tentative solutions), or accept them (tolerable errors). if this is unacceptable though (a tentative solution does not exist or the error is non-tolerable), the user feels as if the tablet is designed in a way that restricts their activity, especially when compared to other devices. note: numbers in brackets denote the id number of the blog post. the complete list can be found in table in appendix regard to his tablet use. instead, he clearly notes that he was motivated to acquire it for two main reasons: the first was his prior experience with his iphone, and the second his desire to find out about the ipad's merits and potential: "i bought an ipad last week because i love my iphone so much (…) and also because i figured that, since i was going to have a lot of time on hands recovering from my surgery, it would be fun to have a cool new toy. (…)" (b ). while the goal is not as specific as in dale's case, gordon similarly compares his tablet experience to his experience with other it and non-it devices, and notes that the tablet performs less satisfactorily, in part due to its form factor: "the first day i had it, i rented a movie i have always loved, movie_ , and tried to watch it for over an hour before simply giving-up. (…) a laptop, buy [sic] the way, would have been much easier because you can adjust and hold the angle screen more easily." (b ). based on the collated narratives, we see that users acquired the tablet with the expectation to either use it within an occasionally well-defined use scenario (vignette ) or experiment with it in an attempt to explore its potential (vignette ). in both cases, negative disconfirmation ensues a comparison. first, users assess the tablet's success in helping them achieve their goals and meet their expectations, and compare how they used to complete tasks and other activities with their other it and non it artefacts to how their workflow changes by using the tablet instead. further, regardless of whether users acquired the tablet for exploitation (vignette ) or exploration (vignette ), negative disconfirmation denotes that the it artefact fails to meet the user's goals and expectations. trial and error is a sequence of attempts to bridge the gap between goals and actual experience, and while the user tries out one or more tentative solutions, in order to overcome their negative disconfirmation. the following vignettes illustrate trial and error behaviour, where ipad users try out different tentative solutions with the aim to tackle their initial negative disconfirmation. peter (b ) is a musician and music editor who has been using his macbook pro and a specialist application (mainstage) to emulate "the sounds of pro keyboards like roland rd pianos and synths when playing live". he is now interested to see if an ipad-centred setup can work equally well for live performances, and therefore replace the laptop. to do this, peter needs to use "a decent "real" midi controller keyboard", but, to his disappointment, on the one hand, "ipads have neither usb nor midi inputs", while on the other hand, such larger keyboards typically requires an external power source (negative disconfirmation). to overcome this, he turned to an adaptor for connecting a keyboard to the tablet that uses the ipad as a power source (tentative solution). while "[t] his worked well, and the ipad was able to power the keyboards for hours", peter "encountered a small glitch when [he] first plugged in [his] midi controller", when his tablet showed an error message that "there wasn't enough power" (tolerable error). however, he felt confident that this "was meant to work"; therefore he "started experimenting" (trying), and discovered that the error was due to the sequence of plugging in cables and adaptors. he next raises the issue of latency as there is a delay between him stroking a key and receiving a response from the tablet. this is however "hardly noticeable, or unnoticeable" (tolerable error) and although he does "have the occasional problem (error) [ …], resetting the ipad makes it responsive again" (tentative solution). to better control sounds and effects, peter had to map the keyboard on the ipad application. yet, while comparing the ipad-centred versus the macbook pro-centred set up, he explains that "there doesn't seem to be a way of mapping all of those useful buttons, knobs and sliders on my keyboard to do anything useful" (negative disconfirmation). after further attempts (trying), he arrives at the conclusion that this is a limitation of the application rather than of the keyboard (no solution). michael (b ), a medical student during his clinical year, has been finding the tablet both useful and versatile for his studies and patient care. while at the hospital, he needs access to medical records that are stored securely into a dedicated content management system. however, he was not able to access this system directly from his tablet (negative disconfirmation). to overcome this obstacle, he tried out using the citrix receiver, a freeware desktop-virtualization package (tentative solution). this allows him to access the centralized host and to "tap into [the] emr system" (no error). however, "[t] here are some ways where [he's] been less than impressed with ipad". michael notes that the ipad is "not a very good input tool" due to the "lack of speed and accuracy" in capturing information while talking to his patients (negative disconfirmation). he has tried "to record notes during patient interviews, both by typing and with a stylus" (tentative solution), but he doesn't consider this set up as satisfactory because, while being "too busy making sure that the […] notes [a] re accurate", he feels the tablet is a barrier between the patient and himself (non-tolerable error). in addition, he compares the tablet to a regular notepad, and deems that "[p] aper and pen is still superior in a lot of cases". harriett (b ) is a litigator and a consultant who switched from the ipad mini to the ipad pro, with the aim to see if she could "be doing more with [the] ipad" while on the go and while meeting clients. she begins her narrative by saying that the ipad mini "has never been [her] preferred device […] [for] productivity related activities", such as preparing presentations, legal briefs and the like. she considers that the larger form factor and the recently made available multitasking features have made "these types of tasks easier […] than they ever were before". however, she describes her personal experience as a "compromise" when compared to her laptop experience. she suggests that "the biggest problem with the ipad pro was it was just too darn big". she could only use it "at a table or another flat service [surface, sic]", and being almost as big as a laptop, carrying the ipad pro required some effort: "it wasn't something that could be thrown in a purse and taken on a whim […] . if i was going to go the trouble, i personally would have preferred to have my mac" (negative disconfirmation). harriett reflects meeting with clients and describes "typing directly on the glass [as] a clunky experience" (negative disconfirmation). she considered pairing an external keyboard (tentative solution), but in her opinion this would make the tablet even more similar to a laptop, where there'd be a physical barrier between her and her client, which is "the situation [she] was trying to avoid by using the ipad in the first place" (non-tolerable error). she further compares note taking on the ipad with a stylus to note taking on a notepad and describes the former as "more fatiguing compared to pen and paper due to having less friction and having to apply more pressure to control the pencil" (non-tolerable error). russ (b ) is a design professional who uses the ipad as his "primary machine", having been a windows user for years. he had been unpleasantly surprised in the beginning. due to his profession, he often has to import pictures and videos from his camera to edit them on his devices. however, he quickly realized that he cannot import these files from the camera's sd card to the ipad (negative disconfirmation). to overcome this obstacle, he tried out a camera connection kit (tentative solution), which seems to be working well (no error). however, with regard to image editing, russ notes that "the biggest issue is image resizing […] . i've found it impossible to resize an image to a specific pixel value without also having to calculate the height value too"] (error). he has attempted to find an easier way around this, but after trying out different third-party applications (tentative solution), none of which seems to work (non-tolerable error). despite that such problems require him to employ "cumbersome methods to achieve the desired results", his productivity has not decreased as a result of him using the ipad as his primary device. these vignettes ( - ) illustrate that users apply a tentative solution so as to try and overcome their initial negative disconfirmation. following this, users may be faced with further issues, namely tolerable (vignette ) or non-tolerable errors (vignettes , , ), which prompt subsequent trials, until users identify a good enough solution or consider that no solution exists. equally they may be faced with no errors (vignettes , ) at all, when the applied tentative solution is considered a good enough one. in what follows, we describe appropriation and rejection as the two major outcomes of trial and error, and we discuss the conditions for each outcome. through trial and error behaviour, users move from tentative solutions to good enough ones, that help them overcome negative disconfirmation. in such cases, these solutions entail the adaptation and modification of the technology, where the user augments the tablet with e.g., external keyboards (vignette ), specialized applications (vignettes , , ), and adapts and modifies their own habits and routines (vignettes ). in doing so, the outcome of trial and error may be the appropriation of the it artefact or its rejection. in the vignette that follows, we illustrate the outcome of appropriation and highlight its conditions. garry (b ) has been an ipad user for some time. during this time, he has moved much of his work his macbook pro and imac to the ipad. he notes however that, "there were a few things [he] needed such as a keyboard case, writing app, etc." in order to do so. in his blogpost, he takes his readers through his journey of how he chose his current set of applications, as well as how he augmented his tablet with an external keyboard that makes him feel "like typing on a macbook pro" while offering "multiple viewing angles like a laptop". he chose these solutions as a result of trial and error, like all other users, and he further explains that while he "had no problems finding those things, the real challenge was in changing [his] os x-centric mindset". this entailed "unlearn [ing] some of [his] long-time habits" and "judg [ing] ios on its own terms rather than constantly comparing it to os x." as a result, garry notes: "the ipad is a more personal experience and i tend to have it with me wherever i am in the house. (…) since i got my ipad air , i have hardly even picked up my iphone plus when i'm in the house. and the ipad has cut down the use of my imac drastically, and mostly left the desktop computer relegated to work duties. don't get me wrong, i still love my imac and iphone plus, but neither of them can compete with the ipad air for certain uses such as games, reading, comics, etc." garry recognises that the ipad can be of value to him, offering a more personal experience, and specifically for some activities such as reading and gaming. to "narrow the absorption gap" (clark , p. ) , he has opted for an external keyboard and applications, such as the kindle app, which enable him to support these practices (carroll and fidock ) . similarly to garry, russ (vignette , b ) has also appropriated the device. he has migrated to the ipad almost the entirety of his computing activity by adjusting his habits and routines as well as modifying the device itself through add-ons: "there are a few areas, such as image resizing, which it severely lacks, but there are cumbersome methods to achieve the desired results on the tablet." in russ' case however, the reason for being motivated to identify these "cumbersome methods" and to employ them on a daily basis, has been the increase of his productivity as a result of the lack of multitasking: "it's lack of side-by-side apps (i.e the traditional mac os x setup) means you actually end up focusing more on the work in hand, because there's nothing that's distracting you across the screen." going back to the definition of appropriation, the stories of garry and russ clearly illustrate that, appropriation of the tablet suggests that the users make the technology 'their own', despite their initial negative disconfirmation, specifically because they identify some benefit in using the it artefact (carroll and fidock ; clark ). russ, for example, having been disappointed that the tablet doesn't allow easy image editing, explains that it is now his preferred device because it has increased his productivity, as it offers a more focused interaction, without distractions. as users learn how to use the new it device, identifying some beneficial use is of paramount importance and is the tipping point for eventually appropriating or rejecting the device. in those cases that users are unable to identify any benefits in using the ipad, or when such benefits come with sacrifices users are unwilling to make, the outcome of the trial and error behaviour is that of rejection. rejection suggests that the tablet cannot support the user in meeting their expectations and achieving their goals. this may mean that the tablet cannot reasonably substitute a previously owned it or non-it artefact or that it doesn't improve one's workflow in some way. one's golas and their experiences with other forms of it, and even with non-it solutions (e.g., pen and paper), have an impact on how an interaction is experienced because they weigh in in how users make sense and act about technology (kendall et al. ; orlikowski and gash ) and is quite applicable when rejecting the ipad. harriett (b , vignette ) acquired the tablet in an attempt to uncover new it-centred use scenarios. without a specific use in mind, she suggests that the large form factor of the ipad was the main issue that caused her negative disconfirmation, and for failing to integrate it in her everyday workflow. its size meant further constraints in relation to portability, ergonomics and connectedness with clients, all of which were obstacles to her productivity. the tentative solutions she identified could only intensify her negative disconfirmation and lead to further errors. as such, she was unable to identify any benefit in using the device as she considered that her past practices of using e.g., a legal notepad, posed fewer restrictions. this resulted in her returning the tablet. another user, dwayne (b ), similarly to harriett, notes that the restrictions he was faced with while using the tablet, exceeded by far the benefits he was able to identify. even though he was "positively surprised by the ipads capabilities" as far as battery life goes, he was not that impressed by the fact that, as a linux user, he was unable to access itunes, which would allow him to load all he needed to the ipad. it is interesting to note however that, rejection and appropriation seem to exist along a continuum, rather than being binary outcomes of trial and error. for example, michael (b , vignette ), having been using the tablet for different usages, considers the device "invaluable", because it allows him to take stock of every possible moment while at work. but he is less content when it comes to how the tablet has been serving him during his rounds because the tablet acts like a barrier between him and his patients, which he consider as a restriction and prioritises ultimately achieving a better communication with the patients rather than his convenience ("who wants a medical student (someday physician) who focuses more on a computer than on the person?"). as such, while the ipad enables new usages and offers some benefits (e.g., studying while on the go, accessing information), it does lead to lead to ineffective practices, too, thus he "put [s] away the ipad" when being with patients. in other words, through michael's example we see that users may appropriate the device for some use scenarios (in this case, accessing health records while on the go), but reject it for others. therefore, the conditions under which trial and error leads to rejection rather than appropriation, have to do with perceived restrictions, and the extent to which these restrictions seem to balance out any potential benefits. in such instances, the perceived restrictions are the reasons for gaps between goals and reality, i.e., they prohibit users from using the tablet according to their initial goals. this is particularly clear in harriett's case who notes that, while she could augment the tablet in a way that would allow her to use it as desired, doing so would outweigh the benefits she was after (portability and flexibility). in this study, we focused on the volitional use of it in order to understand if and how users overcome initial negative disconfirmation, and why some users appropriate the it artefacts, whereas others fail or refuse to do so. drawing from grounded theory method techniques, we have identified two core categories, trial and error, and outcomes, whereby trial and error is observed following negative disconfirmation, and the iterations of trials and errors result in either appropriating or rejecting the it artefact. fig. offers an illustration of how the core categories (trial and error and outcomes) have been build up on the basis of their relationships. as the figure shows, trial and error is composed of feedback loops (iterations), where a preliminary tentative solution may lead to different types of errors (tolerable, non-tolerable errors) or, indeed, no errors. when the feedback loop 'breaks' (dashed line), trial and error leads to the outcomes of appropriating or rejection, depending on the conditions. if, through trial and error and despite the experienced errors, the user has managed to identify some benefits, the most probable outcome will be that of appropriation. alternatively, if the user perceives being restricted, and especially if the experienced errors are deemed as non-tolerable, then the most probable scenario will be that of rejection. what is interesting and begs consideration is that trial and error takes place against the backdrop of constant comparison, whereby the user is comparing the new against the old workflow, their new to their old experience, and ultimately the ipad to other artefacts, which are not necessarily it (e.g., comparison may entail comparing reading a book on the ipad versus reading an actual hardcover book). trial and error has been quite an influential concept, first appearing in the s, when behaviourists observed that consecutive efforts (trials) help overcome an obstacle or solve a problem (costall ). shirahada and hamazaki define it as "the process of continuous knowledge creation and acquisition until success is achieved" (shirahada and hamazaki , p. ). rerup and feldman draw attention to the fact that people compare outcomes to targets, and then revise their routines as necessary so that they can meet those targets (rerup and feldman fig. connecting trial and error to outcomes following negative disconfirmation findings show, as the majority of users have an expressed interest in integrating the tablet in their workflow, upon experiencing negative disconfirmation they proceed with trying out different things in order to overcome it. in doing so, they combine and recombine their available it devices and objects, in order to identify ways to 'make technology work'; when they are successful in this endeavour, they transition from the 'technology as designed' to the 'technology in use' (elbanna and linderoth ) , which echoes existing conceptualizations of appropriation (ahuja and thatcher ) . however, when they are unable to identify ways to make it work in line with their goals, then the iterations of trial and errors lead to rejection instead of appropriation. as a result, trial and error may be observed when users are faced with anomalies and try out new things to overcome problems (mcgann and lyytinen ). we consider trial and error to be a useful concept to discuss voluntary adoption of it. our findings show that, as users begin their interaction with the it artefact, they are required to update their perceptions, exploring what they can do with the it and how they can do it. previous studies have found that users may adapt themselves (i.e., changing their routines), or the it artefact (i.e., using external devices, additional applications etc.), or any combination of the two (e.g., barki et al. ; beaudry and pinsonneault ) , in their endeavour to see what actions are possible (bagayogo et al. ; sun ) . equally, it may be argued that the observed iterative attempts to identify and apply tentative solutions are nothing more a variation of the adaptation cycles, proposed by sun in his seminar paper regarding user revisions of technology (sun ) . these iterative attempts in assessing tentative solutions are in turn reminiscent of the body of work on workarounds (e.g., choudrie et al. ; ferneley and sobreperez ; koopman and hoffman ) , as they entail that users tweak in subtle and less subtle ways the it artefact, incorporating bundles of applications and other fixes (choudrie et al. ; ferneley and sobreperez ; koopman and hoffman ) . there are fewer similarities between trial and error and improvisation. trial and error is primarily based on exploitation; as our findings show, tablet users seek to implement tentative solutions based on what they know and their experiences. in contrast, existing literature suggests that when improvising users tend to be more exploratory (scheiner et al. ) . further, as improvisation is generally timesensitive (pavlou and el sawy ) , users may be able to pursue a limited number of iterations. yet, as our findings show, users may proceed at their discretion with as many trials they want or need to identify tentative solutions; it is telling that some users (peter, vignette ) pursuit several iterations of trial and error, whereas others (harriett, vignettes ) are less committed in identifying a solution. as such, echoing prior literature on trial and error, we posit that the only event that seems to stop the iterations is the identification of a good enough solution (miner et al. ; rerup and feldman ) or the belief that a solution may not exist (swann ) . we thus consider that trial and error can function as an umbrella concept, consolidating essentially the different appropriation variations, specifically for the study of volitional use of it devices. at the same time, however, trial and error does not assume a priori that appropriation will be the only possible outcome. our analysis shows that despite all users' expressed intention to integrate the tablet into their workflow (vignettes and ) , some of them were unable to find a good solution that can help them overcome their negative disconfirmation. therefore, trial and error, despite its problem-focused and solution-focused nature, and in contrast to existing appropriation variations, does not assume that there is a fit between user, task and technology, which eventually will be achieved. instead, trial and error can be an open ended process which may result to the simple abandonment of the device. this contribution is further supported by our findings if we consider that our analysis showed that appropriation and rejection exist along a continuum, whereby a user may appropriate the tablet for certain tasks, and reject it for certain others. indeed, as in michael's case (vignette ), trial and error may lead to partial appropriation (or indeed partial rejection), if the user considers that the tablet is inappropriate in some way within a particular context of use, but at the same adequate or ideal for others. this challenges the rationalist view of technology, whereby adoption, rejection and behaviours in between are mostly seen as decision situations (riemer et al. ) , and further highlights the socially constructed and co-constructed nature of technology (leonardi and barley ) , where the technology and its use evolve together (richter ). this draws attention to the conditions for each outcome. first, the essential component of appropriation is the identification of benefits (carroll and fidock ; clark ) . these benefits motivate users to persist in identifying and trying out different tentative solutions despite their negative disconfirmation. such benefits may relate to productivity, performance, and convenience among other things (dang et al. ) , all of which are directly related to the needs and wants of the individual user. in light of this, wyatt ( ) posits that individual users are more likely to adopt some kind of a technology if they consider that there are some benefits in its use, and they will only do so, if those benefits significantly outweigh any risks that use may entail (e.g., time wasted). indeed our findings clearly show that time savings, and increases in productivity are deemed simply not enough in light of other sacrifices (such as lugging around a heavy object, being unable to focus on the primary activity or to properly interact with others), then the outcome of the trial and error behaviour is that of rejection (michael and harriett, vignettes , ) , where the user discontinues its use for some or all tasks. this suggests that the tablet as an it artefact is not defined or interpreted by its functions and the included or excluded features in its design; instead, users assess it in relation to more practical terms (i.e., if, how, when it can support them) and on its potential place within their wider sociomaterial practices (kendall et al. ; riemer and johnston ) . our theory of tablet appropriation through trial and error accounts for some of the shortcomings that presently exist within the it appropriation literature. specifically, by casting a wide net around negative disconfirmation and focusing on cases of tablet users with an expressed interest to incorporate the tablet within the constellation of their pre-existing it devices, we have identified how and why trial and error may result in the appropriation or the rejection of the tablet. the use of such devices is often volitional rather than mandated, and thus, users can freely abandon the device if they consider that other options are better. in contrast, existing studies that focus on some type of appropriation appear to do so with an underlying assumption that eventually users will be able to identify a fit between the task, the technology and their work practices, through modifications, adaptations, workarounds and the likes. however, this constricted lens often excludes the possibility that such a fit may not exist, and that therefore, despite efforts to appropriate the it device, users may eventually reject it. our theory of trial and error manages to holistically account for both possibilities and explain their conditions within a volitional, individual use, which is a somewhat neglected aspect of is use, despite the ubiquity of tablets. in light of this, we consider that our trial and error theory further contributes to work done by schmitz et al. (schmitz et al. ) in relation to malleable it and adaptation behaviours in two ways. like the adaptive structuration theory they propose, our theory addresses voluntary use of it for both personal and professional use contexts. both approaches uncover rich use scenarios, iterative, exploratory and exploitive behaviours, and numerous types of adaptions and modifications that can be combined with each other toward achieving appropriation. however, we extend this work by formally incorporating the possibility for the rejection outcome. our second contribution relates with the characteristics of this trial and error process. contrary to organisational systems, such as enterprise resource planning (erp) systems, contemporary devices are quite heterogenous in that they can fulfil a number of professional and private usages. while this heterogeneity may influence appropriation in a positive way it can also have a negatively affect if the learning costs are high. relatedly, devices such as tablets, smartphones and the likes, are platform-based devices, anchored to an ecosystem of complementors, apps and users. a user may benefit by others' trial and error learning (e.g., through their blogposts and forum comments) and, equally, by the availability of hardware extensions and software apps, all of which may expand their usability options and help them identify new usages. we also note that trial and error with devices within a volitional context of use is not typically conditioned by external stakeholders who could otherwise enforce temporal, and other restrictions on it use. as a result, trial and error is an iterative behaviour that may continue unobstructed up until the point where the user feels either content or too disheartened to carry on. finally, at its core, trial and error has the comparison of task completion with or without the particular device, which is often used as a means to halt trial and error (and move towards rejection), or move towards appropriation. studying how users use it artefacts when technology falls short holds great potential for it designers, manufacturers and organisations because it opens up a window into why and how users appropriate or reject the said artefacts. at the moment, there are numerous it devices in the market, competing for consumers' attention by promising increased productivity and performance, on top of a pleasant user experience. however, not all of them prove to be as successful. as technology becomes more and more consumerised, personal devices are used for both personal and professional use scenarios (dang-pham et al. ) and such is the case with devices like the tablet. users are able to exert increased control over it adoption and use, because, within volitional contexts of use like bring-your-own-device schemes and consumerised environments, they can make their own choices (doargajudhur and dell ; hovav and putri ) . they are thus able to abandon one it artefact for another, which may offer a better workflow or may simply be more familiar with it. our findings can be used by it designers, manufacturers, practitioners as well as marketers to develop interventions and provide incentives that will facilitate appropriation over rejection. for example, designers could aim at alleviating feelings of being restricted by offering additional connectivity means or reducing existing barriers. equally, marketers and organisations could develop interventions for highlighting and maximising the identification of benefits of using a particular it artefact, especially when there are concerns in relation to a competitor and potential losses from switching. in conclusion, in a world where the success of an it artefact depends on continuous use and its appropriation and integration within one's workflows, our study offers an opportunity to better understand how individual users succeed or not in appropriating the tablet. most importantly, it lays the foundations for future studies by offering a grounded-on-the-data theory that can be applied within our other contexts. our study comes with limitations. while focusing on user behaviour post negative disconfirmation, we have not addressed its impact on a more holistic level so as to consider user experience and satisfaction. these concepts hold great significance for the design of it and can possibly influence the benefit-driven nature of the user (wyatt ) , where such benefits may be more intangible. a second limitation stems from the nature of our empirical material and our methods, both of which are greatly influenced by the study's context and particularism (davison and martinsons ) . our analysis builds on blog posts; as such, the pool of our users represent an intersection of tablet users and bloggers. in addition, these users are authored by mostly male, north america-based ipad users, who hold upper-level managerial positions or are freelancers. this means that our findings are specific to the boundary conditions of this particular demographic. to an extent, this group may be considered homogeneous. however, this is far from the truth. several studies to date have shown that cultural values and national cultures play an important role in how people choose and make use of it, from using online social networks to smartphone devices and applications (e.g., chu et al. ; george et al. ; gupta et al. a ). lastly, as evident from our findings, users have at their disposal a number of it devices, and can thus afford to use either one of them, depending on their desire and preference. therefore, our findings should be treated with caution, since they reflect chiefly interactions of generally affluent professionals who afford to experiment or 'play around' with it devices that are fairly expensive. in this study, we have developed a theory of trial and error when users respond to moments of negative disconfirmation. using gtm, we illustrated the trial and error process that users go through when exploring whether and how a new it artefact fits within their personal and professional lives against the context of a larger portfolio of multiple it artefacts. we have developed our theory from the ground up, where trial and error emerged as a process directly from the data. we consider that the first step should be the validation of our theory across other contexts, including different it devices and different users. in our study we have focused within the volitional context of use. however, given that tablets are actively being used within organisations and issued as corporate devices, it would be interesting to explore how the findings from this research may apply when the users are the employees of an organization and it use is mandated; organizational culture would be an interesting concept to explore (gupta et al. b; scheibe and gupta ) . in addition, it would be interesting to investigate how trial and error may unfold when users are less affluent, and restricted to use one particular device only. in addition, because of the importance of cultural values in it use, future studies should look into underlying differences across nationalities and/or ethnicities, focusing on the role of national culture, as a potential explanatory driver for appropriation and rejection and further theorise around our research questions. equally, we consider that an obvious future step would be a different type of generalisation attempt: rather than generalising to a different population and gauging differences at the basis of cultural values, one could examine whether the theory of trial and error for appropriation is replicable and remains valid for other types of it devices. this would be of additional interest for the post covid- world, where workers are already moving to working-from-home arrangements, and where the choice of it artefacts may be made under their own volitional control and at a time when our dependence on technology has been made abundantly clear both for remaining professionally active and socially connected (seetharaman et al. ) . early to adopt and early to discontinue: the impact of self-perceived and actual it knowledge on technology use behaviors of end users moving beyond intentions and toward the theory of trying: effects of work environment and gender on post-adoption information technology use the impact of design debugging on new product development speed: the significance of improvisational and trial-and-error learning theory of workarounds enacting computer workaround practices within a medication dispensing system institutionalized computer workaround practices in a mediterranean country: an examination of two organizations enhanced use of it: a new perspective on post-adoption information system use-related activity: an expanded behavioral conceptualization of individuallevel information system use five-factor model personality traits as predictors of perceived and actual usage of technology technological appropriations as workarounds: integrating electronic health records and adaptive structuration theory research electronic trading and work transformation in the london insurance market appropriation of information technology: a requisite for improved individual performance understanding user responses to information technology: a coping model of user adaptation understanding changes in belief and attitude toward information technology usage: a theoretical model and longitudinal test understhanding information systems continuance: an expectation confirmation model blogging platform posterous to shut down on april th enacting integrated information technology: a human agency perspective cascading feedback: a longitudinal study of a feedback ecosystem for telemonitoring patients with chronic disease reconceptualizing system usage: an approach and empirical test beyond resistance to technology appropriation constructing grounded theory. a practical guide through qualitative analysis strategic relevance of organizational virtues enabled by information technology in organizational innovation understanding individual user resistance and workarounds of enterprise social networks: the case of service ltd a systematic review on crosscultural information systems research: evidence from the last decade notes on improvisation and time in organizations. accounting, management and information technologies the labyrinths of information: challenging the wisdom of systems anglo-american innovation how lloyd morgan's canon backfired have your cake and eat it too? simultaneously pursuing the knowledge-sharing benefits of agile and traditional development approaches examining the impacts of mental workload and task-technology fit on user acceptance of the social media search system investigating the diffusion of it consumerization in the workplace: a case study using social network analysis user acceptance of computer technology: a comparison of two theoretical models context is king! considering particularism in research design and reporting understanding fit and appropriation effects in group support systems via meta-analysis capturing the complexity in advanced technology use: adaptive structuration theory impact of byod on organizational commitment: an empirical investigation the validity of the improvisation argument in the implementation of rigid technology: the case of erp systems the formation of technology mental models: the case of voluntary use of technology in organizational setting user adaptation and is success: an empirical investigation among french workers. international conference of information systems eureka moments in the works of claudio ciborra resist, comply or workaround? an examination of different facets of user engagement with information systems the effects of communication media and culture on deception detection accuracy theoretical sensitivity: advances in the methodology of grounded theory the discovery of grounded theory: strategies for qualitative research the effects of national cultural values on individuals' intention to participate in peer-to-peer sharing economy relationships between it department culture and agile software development practices: an empirical investigation grandma's new tablet -the role of mobile devices in trying to innovate in it everyday power struggles: living in an iois project code saturation versus meaning saturation: how many interviews are enough? qualitative health research human documents research: from the diary to the blog this is my device! why should i follow your rules? employees' compliance with byod security policy a comprehensive conceptualization of post-adoptive behaviors associated with information technology enabled work systems understanding technology as situated practice: everyday use of voice user interfaces among diverse groups of users in urban india investigating mobile wireless technology adoption: an extension of the technology acceptance model work-arounds, make-work, and kludges making sense of professional identities: stories of medical professionals and new technologies understanding mandatory is use behavior: how outcome expectations affect conative is use identifying it user mindsets: acceptance. resistance and ambivalence email adaptation for conflict handling: a case study of cross-border interorganisational partnership in east asia what's under construction here? social action, materiality, and power in constructivist studies of technology and organizing do electronic health records affect quality of care? evidence from the hitech act innovating with technology in team contexts: a trait activation theory perspective designing for appropriation: a diy kit as an educator's tool in special education schools the improvisation effect: a case study of user improvisation and its effects on information system evolution qualitative data analysis organizational improvisation and learning: a field study a paradox of progressive saturation: the changing nature of improvisation over time in a systems development project the problem with workarounds is that they work: the persistence of resource shortages improvising organizational transformation over time: a situated change perspective technological frames: making sense of information technology in organizations user responses to new system implementation: a bricolage perspective. international conference information systems (icis the "third hand": it-enabled competitive advantage in turbulence through improvisational capabilities understanding fire fighting in new product development routines as a source of change in organizational schemata: the role of trial-and-error learning corporate social networking sites-modes of use and appropriation through co-evolution australasian conference on information systems acis . th australasian conference on information systems acis what is it in use and why does it matter for is design? eliciting the anatomy of technology appropriation processes: a case study in enterprise social media on the role of information overload in information systems (is) success: empirical evidence from decision support systems balancing fluid and cemented routines in a digital workplace the effect of socialization via computer-mediated communication on the relationship between organizational culture and organizational creativity organisational and individual unlearning in identification and evaluation of technologies capturing the complexity of malleable it use: adaptive structuration theory for individuals being (more) human in a digitized world trial and error mindset of r&d personnel and its relationship to organizational creative climate tumblr's porn ban could be its downfall-after all, it happened to livejournal. the verge understanding user revisions when using information system features: adaptive system use and triggers the object of knowledge: the role of objects in biomedical innovation what happens when learning takes place? interchange how and why trust matters in post-adoptive usage: the mediating roles of internal and external self-efficacy grounded theory for qualitative research. a practical guide using grounded theory method in information systems: the researcher as blank slate and other myths multiple faces of codification: organizational redesign in an it organization a theoretical extension of the technology acceptance model: four longitudinal field studies user acceptance of information technology: towards a unified view extending the two-stage information systems continuance model: incorporating utaut predictors and the role of context technological embeddedness and organizational change grounded theory methodology in information systems research understanding user adaptation toward a new it system in organizations: a social network perspective non-users also matter: the construction of users and non-users of the internet accommodating practices during episodes of disillusionment with mobile it publisher's note springer nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations we collected the empirical material in two stages, between march -august , and january -july during the first stage, we collected blogposts, authored by unique bloggers. during the second stage, we collected in addition blogposts, authored by unique bloggers. the complete casebook of the study is shown in table (appendix ).for both stages, our search strategy entailed initiating a google search at first, using the following keywords: "experience" and "ipad" and "blog".we initially conducted this research in a nondiscriminatory manner, in order to get a preliminary idea about the themes bloggers tend to discuss. we then focused specifically on the main blogging platforms, i.e., wordpress. com, medium.com, blogger.com, tumblr.com, posterous.com (now defunct). within these platforms, we used the search functionality as well as the hashtag or the tag functionality to identify additional relevant posts (snowball sampling).the collated blog posts were then examined against our inclusion and exclusion criteria. we excluded any blogpost that could be seen as being a technical review, as affiliated directly or indirectly with apple inc. or as containing indications that the blogpost has been endorsed in some way or sponsored by any of the manufacturers/developers of any of the products and/or services mentioned in the blog.relatedly, to be included, each blogpost had to: a) contain a rich description of the blogger's interaction with the tablet, b) describe voluntary use of the device within both professional and personal use scenarios, c) contain a description of negative disconfirmation i.e., the user attempts to use the device in a particular way but failing to do so for one or more reasons, and d) describe an underlying effort to overcome disconfirmation. these criteria allowed us to collect material that contained contextual and processual information, supporting us in addressing our research question.specifically for the second stage of data collection (january -july ), which was driven by theoretical sampling, we purposefully sampled blogposts with the aim to enrich the meaning of our existing emerging codes, rather than expand the reach of our evolving theory. in other words, we aimed at identifying additional cases, where blog authors were discussing the same concepts, so that we could densify our theory, by verifying the usefulness of the core categories and establishing the core conditions for each.considering the nature of our empirical material, it is critical to note that some of the blogging platforms are now defunct, and several of the blogs are not online anymore. for example, posterous.com, once a very popular blogging platform, shut down in early (bishop ) . with it, lots of our empirical material vanished. similarly, when in late , tumblr.com announced the ban of a certain type of content, millions of posts vanished from the platform, which led to a mass migration of users onto other platforms (stephen key: cord- - vs mq authors: zhou, tongxin; wang, yingfei; yan, lu; tan, yong title: spoiled for choice? personalized recommendation for healthcare decisions: a multi-armed bandit approach date: - - journal: nan doi: nan sha: doc_id: cord_uid: vs mq online healthcare communities provide users with various healthcare interventions to promote healthy behavior and improve adherence. when faced with too many intervention choices, however, individuals may find it difficult to decide which option to take, especially when they lack the experience or knowledge to evaluate different options. the choice overload issue may negatively affect users' engagement in health management. in this study, we take a design-science perspective to propose a recommendation framework that helps users to select healthcare interventions. taking into account that users' health behaviors can be highly dynamic and diverse, we propose a multi-armed bandit (mab)-driven recommendation framework, which enables us to adaptively learn users' preference variations while promoting recommendation diversity in the meantime. to better adapt an mab to the healthcare context, we synthesize two innovative model components based on prominent health theories. the first component is a deep-learning-based feature engineering procedure, which is designed to learn crucial recommendation contexts in regard to users' sequential health histories, health-management experiences, preferences, and intrinsic attributes of healthcare interventions. the second component is a diversity constraint, which structurally diversifies recommendations in different dimensions to provide users with well-rounded support. we apply our approach to an online weight management context and evaluate it rigorously through a series of experiments. our results demonstrate that each of the design components is effective and that our recommendation design outperforms a wide range of state-of-the-art recommendation systems. our study contributes to the research on the application of business intelligence and has implications for multiple stakeholders, including online healthcare platforms, policymakers, and users. internet technologies enable information to be generated and disseminated at almost no cost, which accelerates the growth of information in online environments. social media platforms, for example, allow users to share abundant content, including blogs, music, videos, and other formats, which individuals can freely choose to consume. although various information options increase individuals' choice opportunities, having too many choices can be overwhelming and sometimes even confusing. often, individuals experience difficulties in spotting content that is truly relevant to themselves or in which they are indeed interested (konstan and riedl ; ricci et al. ) . such a choice overload issue can lessen users' experience and create barriers to individuals' engagement in online platforms. online healthcare communities (ohcs), which are social-media-based platforms that gather users with similar health-management interests, are no exception. due to their easy access, ohcs are increasingly being used by individuals to learn about their illness, become familiar with treatment routines, and connect with others in similar circumstances. typical ohcs provide users with various healthcare interventions to promote healthy behavior and improve adherence. examples include behavioral treatment programs or plans that help individuals to establish healthy habits in regard to diet and physical exercise. during the recent covid- pandemic, for example, individuals often engage in online work-out activities to relieve stress and stay healthy (pew research center ). individuals can freely choose interventions in which to participate in an online environment. when faced with too many choices, however, individuals may find it difficult to decide which option to take, as they may not know what would work or even what to expect, especially when they are not healthcare professionals and do not have adequate experience in evaluating each choice. as a result, they may fall into analysis paralysis (oulasvirta et al. ) and fail to engage in any health-management activities. this may harm their self-intervention adherence and outcome (nutting et al. ; snyderman and dinan ) . the choice overload issue significantly affects one's participation experience or outcome in ohcs, leading to the pressing demand for services that can better fit individuals' healthcare needs. therefore, in this study, we aim to follow the design-science paradigm to develop a personalized healthcare recommendation system as a means to support individuals' engagement in health management. recommendation systems are intelligence-based algorithms that can help users to filter information and discover alternatives that they might not have found otherwise (konstan and riedl ; vozalis and margaritis ) . existing recommendation systems deploy various approaches to learn users' preferences from user-behavior data, such as collaborative filtering, content-based filtering, and hybrid models . research has shown that recommendation systems can effectively improve business performance and customer experience (konstan and riedl ; pu et al. ) in ecommerce settings. despite their extensive use in ecommerce settings, whether and how recommendation systems can be integrated with online healthcare platforms has received little attention and remains largely underexplored. there are several unique patterns associated with users' health behaviors that create challenges in healthcare recommendations. first, previous health studies suggest that individuals' health behaviors are frequently affected by their evolving health status and health-management experiences (johnson et al. ; king et al. ; yan and tan ) . thus, individuals' healthcare needs can exhibit strong temporal dynamics. second, individuals' health management usually contains multi-dimensional effort, as promoting health requires individuals to make a series of changes in all aspects of motivation and lifestyle. for instance, in weight management, individuals need to jointly monitor and manage different behavioral aspects, such as dietary behaviors and participation in physical activities. these patterns indicate that individuals' healthcare needs can be diverse, as individuals may need support for each type of health-management activity. given these unique patterns of individuals' health behaviors, conventional recommendation systems that are proven effective in ecommerce settings may not be effective in the healthcare context. this is because these algorithms generally exploit historical data to learn users' preferences. as individuals' health behaviors are continually changing, their health-behavior variations may not be fully captured by the historical data, especially when individuals' health-behavior data remain limited. thus, a mere exploitation of historical data may not be sufficient in healthcare recommendations. in addition, when individuals do not have well-established preferences about healthcare interventions, they may dynamically form their preferences based on the recommended items. the conventional recommendation systems do not take into account such interactions between users and recommendations and, thus, may not be effective in improving long-term recommendation performance (liu et al. ) . finally, conventional recommendation systems are generally shown to over-specialize recommendations (fleder and hosanagar ; pariser ) . thus, they may not well support users' diverse healthcare needs. these research gaps motivate us to propose a novel recommendation design that utilizes a multi-armed bandit (mab) as the main building block. an mab is an online-learning framework in statistics and machine learning for solving decision-making problems in noisy or changing environments (auer et al. ; chapelle and li ) . specifically, when decision-makers (e.g., service providers) do not know the outcome of an action (e.g., recommendation), an mab can help them to sequentially select choice alternatives while actively gathering information on each alternative's expected payoff (zeng et al. ) . in this process, an mab strikes a balance between exploiting the learned knowledge to gain immediate rewards (reusing a highly rewarding alternative from the past) and exploring potential better alternatives (trying new or less-used alternatives to gather more information), which is known as the "exploitationversus-exploration" tradeoff. by doing so, an mab aims to maximize the cumulative reward during the entire decision-making period. in the healthcare recommendation context, service providers tend to have little knowledge about users' healthcare preferences, as individuals may constantly change their health behaviors and healthcare needs. thus, the mab framework can be used in such a setting to efficiently guide the learning of users' changing healthcare needs. in addition, through the exploration process, an mab framework can promote the discovery of users' diverse healthcare needs, which may not be revealed by their historical behavior data. to better adapt an mab to the healthcare-recommendation context, we follow prominent healthbehavior theories to further extend and enhance a standard mab by synthesizing two model components, deep-learning-based feature engineering and diversity constraint. first, we design and implement two deeplearning models to extract user embeddings and item embeddings, which enables us to capture information that is critical to a healthcare decision-making context, such as users' health histories and health-behavior sequences (johnson et al. ; king et al. ; yan and tan ) and intrinsic attributes of healthcare interventions. taken together, the constructed user embeddings and item embeddings help to improve the personalization and contextualization of healthcare recommendations. the second model component is incorporated based on social cognitive theory (sct) (bandura ; bandura ) . sct proposes a classic paradigm for understanding individuals' personal-influenced-based health-management behaviors. based on sct, we theorize the major dimensions of health management, and we use a diversity constraint to ensure that recommendations are structurally diversified along each of the health-management dimensions, so that individuals are provided with well-rounded support. to this end, we propose a thompson sampling (ts)-based algorithm to solve this constrained recommendation task. our proposed recommendation framework is evaluated through a series of experiments, using data collected from a leading non-commercial online weight-loss platform in the united states. the focal platform provides weight-loss challenges to users, which are structured behavioral treatment programs to help users to manage short-term weight-loss goals, such as changing a dietary behavior, increasing physical exercise, and reducing weight in certain periods. we apply our recommendation framework to this weightmanagement setting to help users to find the most relevant challenges as a means to improve their engagement in weight-management activities. our evaluation results suggest that each of our proposed model components is effective and that our recommendation framework significantly outperforms a wide range of benchmark models, including ucb, e -greedy, and state-of-the-art conventional recommendation systems, such as context-aware collaborative filtering (cacf), probabilistic matrix factorization (pmf), and content-based filtering (cb). in addition, we demonstrate that our recommendation framework can more effectively learn the dynamics and the diversity distribution in users' challenge choices. from users' perspectives, we find that our recommendation design can serve to benefit a larger user population on the platform. finally, we take a further step to evaluate our recommendation performance with respect to users' weight-loss outcomes. the evaluation results suggest that our proposed recommendation design can help users to achieve the highest average weight-loss rate compared to the benchmark models. our study makes several key contributions to the literature and practice. first, one major contribution of our study is the proposed healthcare recommendation framework, which demonstrates that prescriptive analytics can be integrated via a design-science artifact (abbasi et al. ; chen et al. ) to provide decision-making support for individuals' health management. the novel aspects of our recommendation framework include ( ) a deep-learning-based feature engineering procedure, ( ) a domain-knowledgedriven diversity constraint, and ( ) a customized online-learning scheme. to the best of our knowledge, our study is among the first to combine an mab with deep context representations and to introduce recommendation constraints for diversity promotion. second, from a practical perspective, our recommendation framework can be applied to address real-world challenges in healthcare recommendations. online healthcare platforms can adopt our recommendation design to improve users' health-management experience on the platform. finally, the design of our recommendation framework can be further generalized to settings beyond healthcare. the online-learning scheme of an mab enables decision-makers to adaptively adjust their strategies to minimize opportunity cost, and the deep-learningbased feature engineering procedure can help decision-makers to better understand the context-dependency of their decision results. our study is related primarily to two streams of literature, that is, individuals' health management and recommendation systems. in the following, we first review prominent health-behavior theories to identify the unique behavior patterns associated with individuals' health management. this discussion provides the theoretical foundation for our recommendation design. we then review the existing recommendation algorithms and discuss their limitations in delivering healthcare recommendations with respect to individuals' health-behavior patterns. finally, we introduce an online-learning framework, mab, which has gathered increasing attention from the literature for its capability of solving decision-making problems under uncertainty. we explain how an mab framework can be implemented in capturing individuals' health-behavior patterns in the healthcare recommendation process. individuals' lifestyles play a significant role in affecting their quality of health. poor health behaviors, such as smoking, alcohol abuse, and sedentary living habits, have been shown to be associated with multiple health risks (cdc ). thus, the management of personal health usually requires individuals to invest effort into making a health-behavior change. for example, in managing a chronic condition, such as obesity or type diabetes, patients need to continually self-regulate their ongoing lifestyle in regard to dietary behaviors and participation in physical activities. researchers have found that patients' active engagement in health management is generally associated with improved adherence to treatment plans and better health outcomes (nutting et al. ; snyderman and dinan ) . according to prior health-behavior studies and theories (bandura ; bandura ; johnson et al. ) , individuals' health management may exhibit unique patterns, such as behavior dynamics and diversity. these patterns play a decisive role in shaping individuals' preferences for healthcare interventions and affect the design of healthcare recommendation systems. in the following, we introduce several prominent health-behavior theories to motivate our recommendation design. previous health studies have generally depicted individuals' health management as a dynamic process. johnson et al. ( ) suggested that, in the process of health management, individuals may frequently adapt their health behaviors based on their personal health condition and health-management experiences, such as treatment compliance, self-monitoring, and healthcare-knowledge seeking. in addition, the social environment may dynamically transform individuals' health behaviors by affecting mental well-being (king et al. ; yan and tan ) . for instance, the exchange of emotional or informational support among peers may encourage optimism and self-esteem of individuals (dimatteo ), which can help them to better comply with a treatment plan and make a behavior change (johnson and wardle ; krukowski et al. ; wang et al. ) . together, these studies indicate that individuals' health behaviors need to be understood with respect to specific health and social context. as health and social context may evolve with time, individuals' health behaviors generally exhibit strong temporal dynamics. psychosocial theories have extended our understanding of how cognitive and social factors contribute to personal health, among which sct (bandura ; bandura ) is widely used in the health literature to describe individuals' health-management behaviors. sct proposes a personal-influence-based selfregulation model, in which individuals exert control over their motivation and behaviors to achieve better health outcomes. the theory suggests that individuals' self-regulation contains multi-dimensional effort. first, individuals need to set proper health goals to motivate themselves toward a desirable health outcome. second, individuals need to operationalize their goals into actual behavioral aspects so that they can gain behavior-management skills and strategies to tackle challenges and fulfill expectations effectively. depending on health contexts, individuals may need to attend to different behavioral aspects at the same time. in weight management, for example, individuals need to manage both their dietary behaviors and physical activities to control their calorie intake and expenditure. based on sct, there are two major health-management dimensions: outcome-oriented dimension(s) and behavior-oriented dimension(s). the former influences individuals' motivation for health behaviors, whereas the latter affects the course of behavior execution. corresponding to these dimensions, individuals may need different types of support to guide them through the self-regulation process. for instance, individuals may need instructions on setting reasonable health goals to help them understand and manage their progress toward a targeted health condition; as well, they may need suggestions on how to cope with difficulties in the process of establishing health-behavior routines. these patterns indicate that individuals' healthcare needs can be diverse. the dynamic and multifaceted nature of health management has brought new challenges in healthcare recommendations. in this section, we review the existing recommendation systems to discuss the research gaps associated with conventional recommendation schemes that have impeded them from addressing individuals' unique health-behavior patterns. recommendation systems are intelligence-based decision-making algorithms that can help users to filter information or product choices based on their own preferences or interests, especially when there is information or product overload (konstan and riedl ; vozalis and margaritis ) . during the last few decades, recommendation systems have garnered considerable attention from both academia and industry for their capability in delivering personalized services and generating benefits for service providers and customers (isinkaye et al. ; pathak et al. ; pu et al. ). in the literature, a large body of research has focused on batch-learning-based recommendation systems, such as collaborative filtering, content-based filtering, and hybrid models ). these recommendation systems generally adopt a "first learn, then earn" recommendation scheme. that is, they first learn users' preference patterns based on a series of historical data, and then they fully exploit the learned knowledge to make future recommendations. for example, collaborative filtering makes recommendations based on similarities in users' item-selection histories (adomavicius and tuzhilin ; sedhain et al. ) , and content-based filtering leverages the content attributes of users' previously selected items (bieliková et al. ; pon et al. ). previous studies have proposed a variety of techniques to learn users' preference patterns from historical data, such as context-aware recommendation systems (cars) that model contextual dependency of users' behaviors, and model-based techniques to learn latent user representations. the "first learn, then earn" scheme, however, is based on the assumption that users' preferences have a static pattern that can be well represented by the historical data (adomavicius and tuzhilin ; sahoo et al. ) . when users' preferences are constantly changing, such recommendation methods may become less effective in adapting to individuals' behavior dynamics, as it is likely that individuals' preference patterns will not be fully captured by the data. in addition, prior studies have generally shown that the batchlearning-based models tend to over-specialize recommendations in the long run (yu et al. ), as they tend to focus on well-known items that already have accumulated adequate historical information, whereas the items with limited historical data will be overlooked (fleder and hosanagar ; pariser ) . as a result, these models can be ineffective in satisfying individuals' diverse healthcare interests. these research gaps motivate us to propose an online-learning scheme, i.e., multi-armed bandit (mab), to address the dynamics and diversity in individuals' health behaviors to improve healthcare recommendations. in most real-world decision-making scenarios, decision-makers usually do not know the expected utility of an action and can learn only from experience (cohen et al. ; mehlhorn et al. ; speekenbrink and konstantinidis ) . in statistics and machine learning, multi-armed bandit (mab) has been proposed to explicitly formulate such decision-making scenarios under uncertainty (auer et al. ; gittins ) . specifically, an mab models a sequential decision-making problem in which the underlying reward distribution for each action is unknown, and data can be obtained in a sequential order to update knowledge of the reward distribution. the rationale of an mab algorithm is to adaptively learn the reward associated with each action while gathering as much reward as possible during the entire decision-making process, that is, earning while learning (misra et al. ) . in order to do so, an mab strikes a balance between exploration and exploitation (kim and lim ; li et al. ; tang et al. ) . that is, on the one hand, an mab reuses highly rewarding alternatives from the past to ensure explicit short-term rewards, that is, "exploiting" the environment (cohen et al. ; mehlhorn et al. ) ; on the other hand, it takes actions to learn the outcome associated with the less-explored alternatives to minimize opportunity cost, that is, "exploring" the environment (cohen et al. ; speekenbrink and konstantinidis ) . it is worth noting that the online-learning scheme stands in contrast to batch-learning algorithms. the former actively collects data to learn the environment, with a forward-looking goal of maximizing longterm rewards. in other words, online learning may deviate from the current "best" knowledge from time to time in exchange for potential better learning performance and higher rewards collected in the future. in contrast, batch-learning algorithms fully exploit the current knowledge without exploring potential better opportunities that are not shown in the historical data. as such, they tend to interact with the environment in a passive and myopic manner and may not learn effectively when the environment contains many uncertainties that cannot be represented by current data. research has shown that online-learning algorithms, i.e., mabs, are suitable for tackling decisionmaking problems in noisy and changing environments (speekenbrink and konstantinidis ) . for example, misra et al. ( ) applied an mab to a pricing problem in which the volume of demand was uncertain. schwartz et al. ( ) used an mab to improve advertising design when online advertisers were not able to identify targeted users. in our healthcare-recommendation context, service providers (e.g., online healthcare platforms) usually have little knowledge of users' healthcare needs or preferences, especially when users frequently change their behavior patterns. an mab can be used in such a setting to help service providers to effectively explore users' preference variations while improving users' online engagement during the process. in addition, through exploration, an mab increases choice stochasticity and, thus, can better promote recommendation diversity (qin et al. ) . despite these advantages, mabs are seldom studied in healthcare recommendation problems. in this study, we enrich the healthcare recommendation literature by designing an mab-driven framework for providing personalized healthcare interventions. we propose a deep-learning and diversity-enhanced mab framework for recommending healthcare interventions to address the challenges and research gaps presented in the previous section. first, we adopt an mab as the main building block of our framework, as it can effectively explore variations in users' healthcare preferences and promote recommendation diversity at the same time. to better adapt an mab to the healthcare recommendation setting, we then further enhance our framework by synthesizing two model components, that is, deep-learning-based feature engineering and a diversity constraint. as suggested by prior health studies (johnson et al. ; king et al. ; yan and tan ) , individuals' health behaviors are dynamically affected by a series of contexts, including their evolving health status, healthmanagement experiences, and social context. based on these studies, the sequential information embedded in individuals' health histories and health-behavior paths can play an essential role in shaping individuals' health behaviors. deep-learning models can effectively capture patterns from dynamic temporal sequences and extract complex synergies between different features, thereby enabling the enhanced representation of variations in individuals' health behaviors. we thus incorporate a deep-learning-based feature engineering procedure to improve recommendation personalization and contextualization. in addition, sct suggests that individuals' health management may contain multi-dimensional efforts. the diversity constraint helps us to structurally diversify recommendations along each theory-driven health-management dimension so that individuals are provided with well-rounded support. in figure , we provide a graphical illustration of our recommendation design. each construct of the recommendation framework is intended to enhance healthcare recommendation performance. in a recommendation cycle, we first use deep-learning models to construct representations for users and items, i.e., the user embeddings and item embeddings. we then use the constructed embeddings to capture the contextual features of the recommendation environment, which enables us to learn the context-dependency of the recommendation results and generalize users' feedback. the mab algorithm, shown on the right side of figure , adaptively learns users' preferences by balancing the exploitation-versus-exploration tradeoff. the diversity constraint seeks to diversify the recommendations along the theorized healthmanagement dimensions. we elaborate each of these constructs in the remainder of this section. ohcs provide healthcare interventions to encourage and instruct individuals' health behaviors. in table , we provide several examples of typical healthcare interventions provided in ohcs. we consider the setting in which an online healthcare platform provides intervention suggestions to users on a regular (e.g., weekly) basis. it is unlikely that every individual user will prefer the same interventions, and there is interpersonal heterogeneity in terms of which interventions to adopt and to what extent. the goal of the platform is to adaptively suggest k interventions with the highest chance of improving individuals' engagement in their healthcare management. formally, let t be the number of recommendation periods and i be the number of users on the platform. suppose that each time the platform provides each user with k alternatives from a full available at the end of each period, the platform receives users' feedback on the recommendations, that is, whether they have adopted or engaged in the recommended interventions. let ( , ) t r i k denote user i 's feedback on item k . users' feedback serves as a reward for the platform's recommendation decisions; the platform may use the information of users' feedback to update its knowledge about users' preferences and adjust its subsequent recommendations. we formulate the above recommendation problem as a contextual mab. in a contextual algorithm, the decision of choice is leveraged upon a set of contextual features of the environment, such as the attributes of choice alternatives and user characteristics, so that the algorithm can exploit the similarity between choice alternatives and deliver online personalized recommendations (zeng et al. ). contextual mabs learn to map the contexts into appropriate actions (greenewald et al. ). thus, they are able to personalize recommendations based on specific situations. in addition, as the pool of users and healthcare interventions may likely undergo frequent changes, it is desirable to learn a feature-based model that can generalize users' behavior histories to the user-item pairs that have never or rarely occurred in the past. to this end, in the healthcare recommendation context, we consider two sets of contextual information: individuals' health-management contexts it x and attributes of healthcare interventions k z . we assume that users' feedback, i.e., the intervention engagement decision ( , ) t r i k , is stochastically generated by an underlying probability that depends on the contexts. we model this probability as a logistic function: x z , and * θ denotes the underlying coefficient vector, which can be learned adaptively in the recommendation process. the objective for the online healthcare platform is to maximize the expected cumulative user engagement during the entire course of recommendation, i.e., ( ) modern recommendation systems should be well-diversified, motivated by the principle that recommending redundant items leads to diminishing returns on utility. in the context of healthcare recommendations, the major health-management dimensions that we identify based on sct include the outcome-oriented dimension(s) and behavior-oriented dimension(s). whereas the former helps individuals to gain outcome-driven motivation, the latter enables them to acquire health-management skills and strategies. thus, to provide individuals with well-rounded support, recommendations need to cover each of the health-management dimensions. although an mab framework is able to promote recommendation diversity through the exploration process, we further incorporate a diversity-constraint mab to ensure that the exploration is conducted in guided directions and that the recommendations are structurally diversified along each of the healthmanagement dimensions. formally, our diversity constraint can be expressed as where s denotes the recommendation set, outcome dim denotes the outcome-oriented dimension(s), and behavior dim denotes the behavior-oriented dimension(s). we subject the optimization problem in ( ) to the diversity constraint d to ensure that the recommendation set it s contains suggestions for each healthmanagement dimension. to solve this constrained recommendation task, we propose an algorithm that is adapted from thompson sampling (ts). ts is a machine-learning algorithm that addresses the exploitation-versus-exploration tradeoff presented in a bandit problem. ts is best understood in a bayesian setting in which it computes the posterior distribution of the unknown parameters θ in the likelihood function, given the realized stochastic feedback. the rationale of ts is to encourage exploration through probability matching. that is, in each round, a ts algorithm randomly draws alternatives according to its probability of being optimal. research has shown that ts generally has better empirical performance than do alternative bandit algorithms, such as ucb and e -greedy (chapelle and li ) . our algorithm extends an ordinary ts by integrating a constrained optimization problem to solve for the optimal recommendation decisions subject to the diversity constraint. we present the details of our algorithm below. a ts algorithm with diversity constraint input: prior mean j m and prior variance j u for each parameter , , , . step (optimization): solve the following optimization problem: observe a new batch of data ( , ( , )), [ ], update the posterior mean by: update the posterior variance by: recommendation size constraint binary decision to improve the characterization of individuals' health-management contexts and enhance recommendation personalization, we design a deep-learning model to construct user embeddings. specifically, our userembedding model leverages information on users' attribute features (e.g., gender, age, etc.), health-status trajectories, and health-management behavioral sequences. for each user, the attribute variables usually remain unchanged over time, whereas health status and behavioral sequences will vary with time. hence, the user embeddings depend on both user i and time t to reflect the evolving dynamics. to properly guide the learning on these aspects, we propose a novel wide-and-deep neural network. the wide-and-deep structure was originally proposed for user response modeling in mobile apps. it combines two branches of user features (i.e., a "wide" branch and a "deep" branch) to facilitate user representation learning (cheng et al. ) . in this study, we design a "wide" branch to process users' attribute features to take into account that certain intervention suggestions can be more actionable for specific users given their personal attributes, and we apply a fully-connected structure to account for possible interactions among the attribute features. we then use a "deep" branch to learn sequence features, such as users' health-status trajectories, historical healthcare-intervention adoptions, and other health-management experiences in regard to selfmonitoring activities and social behaviors. the sequence of historical intervention adoptions is included to capture the dynamics in users' preferences. together with the health-status trajectories, it captures users' evolving health histories and the corresponding changing healthcare preference. in addition, based on prior health theories mentioned in section . , health-management experiences, such as social supports and selfmonitoring activities, may also affect individuals' health behaviors and thus influence their preferences for healthcare interventions. therefore, we further include related behavior paths to capture the effect of healthmanagement experiences on users' intervention-adoption behaviors. we use long short-term memory (lstm) with a self-attention mechanism to capture the dynamically changed patterns in these features and their correlation with adopted healthcare interventions. to address the fact that different sequence features have different dimension scales (e.g., number of social activities vs. numerical intervention attributes), we propose a self-organizing lstm module and add a balancer in the lstm cell to tackle unbalanced weights between different input features. the output of the deep model is the embeddings of the adopted healthcare interventions, with the loss function defined as the cosine distance between the last hidden layer (user embedding) and item embedding. in addition, we enhance the learning process of the deep branch by an auxiliary loss function with the healthcare outcome as the goal. we use the auxiliary loss function to incorporate the effects of healthcare outcomes on individuals' health behaviors and individuals' preferences on healthcare interventions. this is unique to the healthcare recommendation context, in which individuals' health behaviors are fundamentally driven by the goal of optimizing health outcomes. meanwhile, the auxiliary loss branch can also guide the neural network to properly extract signals from the sequence features, as, otherwise, the gradient flow will not be balanced and the shallow structure will dominate the gradient flow. an illustration of the proposed deep learning architecture for user embedding construction is provided in figure . short title that highlights the main features of the intervention and a description or instruction that describes the detailed execution procedure. to capture the semantics embedded in this information, we propose a hybrid model, in which we apply lstm to learn the semantics of the intervention descriptions, and we use average token-level embedding to extract signals from the intervention titles. the outputs are the meta attributes of the intervention (e.g., duration, category, and/or intensity of the intervention). we further finetune the token-level embedding in the procedure of representation learning to ensure low information loss. in sum, the user embeddings and item embeddings help us to learn key contextual information for healthcare recommendations concerning users' health behavior contexts and item attributes, which are then used as the input of our bandit recommendation model. due to the space limit, more details on our userembedding and item-embedding models are provided in appendix a . to evaluate the performance of our recommendation framework, we collected data from a leading non- to help users to establish a healthy living style, the focal platform provides weight-loss challenges, which are behavioral treatment programs that help users to focus on a specific weight-loss goal in a short time period. examples of weight-loss challenges include diet-oriented challenges, such as "cut off processed carbs and include g of mixed veg in every meal," and activity-oriented challenges, such as " minutes of jogging every day." the diet-oriented and activity-oriented challenges provide behavioral guidance for individuals' weight-management routine. users can also find weight-loss-oriented challenges that help them to set goals directly for weight changes, such as losing a certain amount of weight during specific periods. participation in weight-loss-oriented challenges can help users to establish outcome-driven motivation. each weight-loss challenge is defined by a short title and description that contains information on the challenge goal, duration, and instructions. users can choose to join any challenge as long as its starting date has not passed. in appendix a , we provide a screenshot of the challenge webpage to show how users can retrieve challenge information from the online platform. the focal platform does not incorporate any recommendation system to facilitate users' challenge selection. during our investigation period, there were more than challenges provided to users. users may likely find it difficult to select challenges to join, as users may lack the ability to discern, from various choice alternatives, what challenges are suitable for their weight management. in addition, a significant search cost is expended, as users need to spend time reading the challenge descriptions and/or instructions before deciding which challenge to join. these problems can potentially be solved by providing personalized challenge recommendations to support healthy behaviors. our investigation on weight-loss challenge recommendations can help to improve the match between individuals and weight-loss challenges and, thus, improve individuals' weight-management performance. from the platform's perspective, the recommendations can enhance users' participation experience and, thus, contribute to user maintenance and platform sustainability. we collected three datasets to support our investigation of recommendation performance. the first dataset contains descriptive information for each challenge provided on the platform. during our data collection window, there were challenges provided on the platform in total. for each challenge, we collected the title, description, and duration. on average, users can choose from about challenges each week. the second dataset is users' challenge-selection histories; that is, we recorded for each user the challenge(s) that he or she selected per week. this dataset enables us to learn users' preferences for weight-loss challenges. the third dataset contains auxiliary information for each user, such as gender, age, membership duration, initial weight when first joining the platform, online weigh-in activities, the number of friends, and the posts published in the community forum. we use this information to capture users' heterogeneous weight-management contexts. in particular, gender and age are two factors that directly affect individuals' weight status. membership duration measures users' overall weight-management experiences on the platform. initial weight and weekly weigh-in records help us measure individuals' weight-loss status and health histories. the number of friends and forum posts provide proxies for the amount of social support available to individuals (shumaker and brownell ; yan ); thus, we use them to capture the social contexts of users' weight management. we provide a summary of key data statistics in appendix a . users on the focal platform, on average, chose two challenges per week. when users chose any challenge, they chose multiple challenges about % of the time. as noted, there are three major challenge types on the platform: weight-loss oriented, diet oriented, and exercise oriented. we find that users tend to choose different types of challenges whenever they choose multiple challenges. specifically, users choose more than two challenge types % of the time when they choose multiple challenges; they choose all three types of challenges about % of the time. these results indicate the existence of diversity in users' preferences for weight-loss challenges. in addition, we find that users' selection of challenge types drifts over time. that is, users may have preferred certain challenge types at the beginning of a time period and gradually shift to other challenge types as time goes by. these findings provide evidence for the dynamics of users' preferences, which may be due to users' transitions to different weight-loss statuses, in which they need different types of support. it is also likely that users gradually establish their personal tastes in regard to weight-loss challenges during the process of challenge participation. these findings thus provide support for our recommendation design. weight-loss challenges are presented in a textual format with a title and a description. they aim to help users to manage short-term weight-loss goals, such as changing a dietary behavior, increasing physical exercise, and reducing weight. goal setting can reinforce individuals' motivation, and well-structured goal formulation can have positive and directional effects on individuals' task performance (les macleod edd ; locke and latham ). the smart metric (i.e., specific, measurable, attainable, relevant, and time-bound) has been widely used as a gold standard in areas such as education and healthcare for assessing the quality of goals (doran ; ogbeiwi ) . this metric can help individuals to clearly identify the direction for logical action planning and implementation (ogbeiwi ; ogbeiwi ) . thus, smartrelated goal characteristics can influence how individuals perceive the effectiveness of a goal and affect their choice-making behaviors in deciding which goal to pursue. we use the smart metric to characterize each challenge based on the challenge description data. as the number of challenges is large and users' challenge-selection data are comparatively sparse, we need to quantify the similarities among challenges, and the smart-based features can properly guide our calibration of challenge similarity. in particular, corresponding to the goal-setting dimensions specified by the smart metric, we construct the following meta attributes for each challenge: whether the challenge is specifically defined (specificity), whether the challenge goal is measurable (measurability), the intensity level of the challenge (attainability), whether the challenge is related to diet or physical activity (relevancy), and the time span of the challenge (duration). in table , we provide a summary of the annotated challenge meta attributes, which will be used for learning challenge-embedding representation and downstream recommendation task. whether a challenge is specifically defined ( or ) whether a challenge goal is measurable ( or ) diet whether a challenge is related to dietary behaviors ( or ) intensity_diet intensity level for a diet-oriented challenge (l, m, h) activity whether a challenge is related to physical activities ( or ) intensity_activity intensity level for an activity-oriented challenge (l, m, h) whether a challenge contains a goal for weight changes ( or ) intensity_weight_loss intensity level for a weight-loss-oriented challenge (l, m, h) whether a challenge contains motivational words/sentences ( or ) whether a challenge requires individuals to regularly monitor and report their weightloss progress, e.g., body weight, daily diet, running mileage ( or ) duration time span (in weeks) of a challenge in addition to smart-based attributes, we consider two other features that may affect individuals' engagement in challenge participation: motivational and self-monitoring. motivational characterizes challenges from the perspective of goal statement, which has been suggested to be important in helping individuals to build up inner motivation (locke and latham ) . self-monitoring is an important step in goal fulfillment, as it helps individuals to process their performance toward goal achievement (bandura ) . we use self-monitoring to indicate whether a challenge encourages individuals to regularly monitor and report their weight-loss progress. the detailed annotation procedure is provided in appendix a . as noted in section . , our user-embedding model adopts a wide-and-deep network structure, in which we use the wide branch to capture users' attribute features and the deep branch to capture the sequence features. in our evaluation context, users' attribute features include gender, age, initial weight, and membership duration. the sequence features include three parts. the first part captures users' health status, that is, their historical weight variations. the second part is the sequence of historical challenges chosen by individuals, which we use to account for users' personal tastes. the third part concerns users' other behavioral sequences, such as their past social activities (e.g., establish friendships with other users and publish forum posts) and self-monitoring activities (e.g., weigh-in). the auxiliary loss head is designed to measure the weight loss in the next time period, where we choose a combined loss of mse for absolute value prediction and cross-entropy loss for weight-loss sign prediction. we present the detailed network structure and the loss functions in appendix a . for our challenge-embedding model, we use challenge name and description as the inputs, and the annotated challenge meta attributes based on the smart metric are the outputs. the annotated challenge meta attributes help us to depict the key characteristics of a weight-loss-related goal; thus, they provide a good standard for calibrating challenge similarity in our focal context. in operationalizing our diversity constraint, we identify weight loss as the outcome-oriented dimension, as it is the health goal that individuals aim to achieve in our focal context. we identify diet and physical exercise as two behavior-oriented dimensions, as they are two essential behavioral-regulation aspects in weight management. with respect to these dimensions, we ensure that our recommendations cover all three challenge types, i.e., weight-loss oriented, diet oriented, and exercise oriented. therefore, our diversity constraint is specified as represents the recommended challenge set, weightloss dim represents the weight-loss-oriented dimension, diet dim denotes the diet-oriented dimension, and exercise dim denotes the exercise-oriented dimension. we apply our recommendation framework to the weight-management context described in section , with the aim of promoting users' engagement in weight-loss challenges on the platform. in particular, we implement the algorithm introduced in section . to offer top-k challenges to users on a weekly basis. the time span of recommendation is weeks (i.e., the same as our data collection window). to demonstrate the effectiveness of our recommendation design, we follow the design-science paradigm to rigorously evaluate our recommendation framework through a series of experiments. we first examine the effectiveness of our deep-learning embeddings in capturing user characteristics and challenge attributes. we then apply different evaluation approaches to test each of our model components as well as to compare our model against state-of-the-art recommendation systems. to show how the construction of deep-learning models improves feature engineering, we use t-sne to visualize the embeddings in a two-dimensional space (maaten and hinton ). t-sne is a nonlinear dimensionality reduction technique that is well suited for deconstructing high-dimensional data. it can project high-dimensional vectors into lower dimensions without changing the data structure so that it helps us to understand data patterns in a more intuitive way. in a t-sne plot, two points that are close to each other indicate that the corresponding embedding representation vectors are similar. we provide the visualization results for challenge embeddings and user embeddings in figures and , respectively. specifically, we present the t-sne plot for our constructed challenge embeddings in figure ( ). in , we provide the t-sne plots for two state-of-the-art word vec deep-learning models, bert and fasttext. we use these two models as a benchmark for evaluating the performance of our proposed challenge-embedding model. in the plot, we use different colors to indicate each challenge type, such as weight-loss oriented, diet oriented, and exercise oriented. we use the color degree to denote the intensity level of a challenge: a deeper color indicates a challenge of higher intensity. for example, dietoriented challenges are denoted by orange points, and among the points, there are three color degrees: light orange, dark orange, and orange-red, representing low-, medium-, and high-intensity levels, respectively. we find that, in figure ( ), challenges that belong to the same type are tightly clustered. in addition, within each challenge-type cluster, challenges of the same intensity level tend to be close to each other. these patterns indicate that our challenge embedding model can well capture intrinsic challenge attributes, especially the ones that are key to goal-setting theory (doran ) . in comparison, the benchmark models cannot clearly distinguish these challenge patterns. ( ) proposed ( ) bert ( ) fasttext the t-sne plot for our constructed user embeddings is presented in figure ( note that these features are most directly related to our weight-loss context; in particular, gender and age are two demographics that directly affect users' body weight, and weight-loss status reflects users' in-period weight variations. we compare our proposed user-embedding model with two benchmark models, collab_learner and tabular. the results are presented in figures ( ) and ( ) , respectively. the collab_learner model and the tabular model are two encapsulated python learners provided in the fastai library. in particular, the collab_learner model learns user representations from the historical challenge-selection data; however, it is not able to incorporate users' personal characteristics. the tabular model is a deep-learning model that learns user embeddings based on users' tabular attributes, such as gender and age, but does not extract signals from sequential data, such as users' health histories and behavior paths. as shown, these two models do not perform as well as our models, as there is no explicit user pattern displayed. ( ) proposed ( ) collab_filer ( ) tabular ( ) sampled users finally, the sequence shape of the user clusters produced by our model in figure ( ) motivates us to further investigate granular individual-level patterns, as the points in a sequence are likely generated by the same or similar users. we randomly sampled several individual users and plotted their corresponding embeddings in figure ( ). we find that the embeddings of the same user locate close to each other and tend to be concatenated into a trajectory and that the embeddings of different users are located relatively far apart. these results show that our proposed user-embedding model can effectively capture the sequential we conduct an ablation analysis to compare our model with a series of baseline mabs. this allows us to show that each of our model components (i.e., the deep-learning-based feature engineering procedure and the diversity constraint) is effective in helping users to find the relevant challenges. the baseline mabs are the counterpart mab models that partially incorporate or do not incorporate the proposed model components. specifically, the baseline mabs that we investigate include the mab without user embeddings, the mab without challenge embeddings, the mab without either embeddings, the mab without diversity constraint, and the mab without either embeddings or constraint. when user embeddings are not incorporated, we use users' attribute features (e.g., gender, age, etc.) to account for recommendation personalization. when challenge embeddings are not incorporated, we use the annotated challenge features to capture the inherent attributes of challenges, namely, the variables listed in table . evaluating an explore/exploit policy is difficult because we typically do not know the reward of an action that was not chosen. possible solutions include doubly-robust estimation (dudík et al. ) , offline precision evaluated by preference set (qin et al. ) , and simulation. the first evaluation approach, doubly-robust estimation, is an offline data evaluation approach that utilizes pre-collected historical data to evaluate policy performance. the historical data are assumed to contain three sets of information: action, context, and reward. as the data are pre-collected, we are able to observe only the rewards for the chosen action in the data. to adjust for the potential bias caused by the data collection process, the doubly-robust estimator combines two policy evaluation methods, direct simulation (ds) and inverse propensity score (ips). formally, let g denote the offline dataset, which contains action a , context v , and reward r . in our context, action refers to the platform's provision of a challenge in the data, context v includes individuals' weight-management context it x and the challenge features, and r is users' feedback, that is, whether a user selects a challenge. let it s denote the set of challenges recommended to user i at week t , and it s k = . our doubly-robust estimator can be expressed as follows: where ĵ is a reward simulator, and p is the propensity of challenge provision in the data. the rationale of this method is that, when data are not available, the method uses a pre-trained reward predictor to simulate the reward; otherwise, it applies a correction to the reward predictor using the actual data. the second evaluation method measures recommendation precision. following previous studies (qin et al. ; qin and zhu ) , we construct each user's preference set as the set of challenges selected by the user in the data. the recommendation precision is thus the overlap ratio between the recommendation set and the preference set. finally, in light of previous theoretical mab studies (hertz et al. ; sani et al. ) , we examine our model performance through a simulated environment. specifically, we construct a logistic predictor for users' binary challenge-selection decisions, that is, where v is a concatenation of user embeddings and challenge embeddings. the weight vector ζ could have been chosen arbitrarily, but it was in fact a perturbed version of the weight vector trained on a randomly constructed training set (nguyen et al. ) , and the performance evaluation is conducted on a test set. this simulator is omniscient, in the sense of full knowledge of users' preferences and the actual amount of reward accrued by recommendations. we provide the details of these evaluation approaches in appendix a . the evaluation results are provided in table . each value in the table represents users' average selection rate during the entire recommendation course. a superscripted asterisk denotes that a benchmark model performs significantly worse than the proposed model. our results show that the mabs that include only one of the components have an inferior performance (i.e., lower average challenge-selection rate) than our proposed model. specifically, the mab without diversity constraint is suggested to be significantly worse by the doubly-robust estimation and simulation method. the mab without user embeddings or challenge embeddings (or both) is shown to have a worse performance by all three evaluation methods. finally, we find that the mab without either embeddings or diversity constraint performs worse than the mabs that partially incorporate the model components. these results indicate that each of our proposed model components is effective. as compared to users' attribute features, user embeddings can better capture the sequential information embedded in users' health histories and behavior paths and, thus, are more effective. challenge embeddings are more effective than the annotated challenge features, as they are able to extract semantic information from the textual challenge descriptions. in addition, as most annotated challenge features are categorical and one-hot encoded, they may not provide much information for the learning process. in comparison, challenge embeddings can better calibrate the similarity among challenges and make the learning more effective. . *** . *** . ** note: asterisk in superscript denotes that a benchmark model performs significantly worse than the proposed model. significance levels are: * p < . , ** p < . , *** p < . . in figure , we plot the recommendation performance across time to show the learning curve of each model. the x-axis denotes recommendation rounds, and the y-axis denotes the average challenge selection rate up to round t . it is shown that our model achieves the highest learning rate across all periods, regardless of the evaluation approach taken. that is, our model can boost the average challenge-selection rate faster than the benchmark models can. for example, when evaluated by simulation, our model increases the average challenge selection rate from approximately . to approximately . after the -week recommendation phase, which is a % increase. this is followed by the mab with no diversity constraint (~ %), the mab with no user embeddings (~ %), the mab with no challenge embeddings (~ %), the mab without either embedding (~ %), and the mab without either diversity constraint or embeddings (~ %). these results further highlight the effectiveness of our deep-learning-based feature engineering and diversity constraint in the learning procedure. we compare our proposed recommendation framework against a wide range of benchmark models. in particular, for benchmark bandit models, we consider ucb and e -greedy, which are two classic onlinelearning methods to solve the "exploitation-versus-exploration" tradeoff. for batch-learning-based models, we consider a variety of collaborative filtering methods, such as context-aware recommenders and matrixfactorization-based models. we also consider content-based filtering and hybrid filtering. content-based filtering is able to offer recommendations based on item features. hybrid filtering further combines content-based filtering with collaborative filtering to incorporate information embedded in users' challenge selection histories. as the batch-learning-based models make recommendations by exploiting users' itemselection histories, we use the first four weeks of data to train the algorithms. finally, we consider pure exploitation and pure exploration, which are two recommendation schemes that do not seek a balance between exploitation and exploration. we summarize our benchmark models in table . the implementation details of these benchmark models are provided in appendix a . similarly, we calibrate recommendation performance by users' average challenge-selection rate and evaluate it through the aforementioned three evaluation approaches, i.e., doubly-robust estimation, offline precision, and simulation. our evaluation results are provided in table . ( ) doubly-robust estimation ( ) offline precision ( ) omniscient simulator the results show that our proposed model has the best recommendation performance under all evaluation measures. it achieves an average challenge selection rate of . % when evaluated by doublyrobust estimation; . %, by offline precision; and . %, by simulation. ucb and e -greedy are shown to have comparable performance (~ % for doubly-robust estimation, ~ % for offline precision, and ~ % for simulation), but both perform significantly worse than our model. batch-learning-based models generally do not perform well (mainly below %). the performance inferiority may be due to users' dynamic preferences for weight-loss challenges. in other words, the batch-learning-based models assume that users' preferences can be well represented by their past behavior patterns (adomavicius and tuzhilin ; sahoo et al. ) ; thus, these models can be biased when users' preferences contain dynamic e -greedy a bandit algorithm that chooses the arm with the seemingly highest average reward with probability e and explores a random arm with probability e ; cacf context-aware collaborative filtering, which incorporates the contexts of users' item selections as weights into a normal collaborative filtering procedure (chen ) ; scf social collaborative filtering, which formulates a neighborhood-based method for cold-start collaborative filtering in a generalized matrix algebra framework (sedhain et al. ); pmf probabilistic matrix factorization, a model-based collaborative filtering approach that uses matrix factorization under a probabilistic framework to estimate user-item interactions (mnih and salakhutdinov ) ; camf context-aware matrix factorization, which is an extension of the classic matrix factorization approach for incorporating contextual information (baltrunas et al. ); cb content-based filtering, an approach to offer recommendations based on content similarities of items (bieliková et al. ; pon et al. ) ; hybrid_pure a hybrid model that combines pure collaborative filtering with cb using mixed hybridization (burke ) ; hybrid_cacf a hybrid model that combines cacf and cb using mixed hybridization; pure exploitation a model that selects the best option given current knowledge; pure exploration a model that fully randomizes recommendations. finally, our results show that pure exploitation and pure exploration achieve worse recommendation performance as compared to our model. this performance gap indicates the importance and necessity of balancing the exploitation-versus-exploration tradeoff. a pure exploitation method may be stuck in a worse local optimum. pure exploration, in contrast, over-explores users' preferences; it fully randomizes the recommendations without utilizing or learning from information embedded in users' past behaviors. note that pure exploitation and pure exploration can often be seen in the design of a/b testing. specifically, in an a/b test, experimenters first spend a short time period for pure exploration, whereby they randomly assign users to different groups to examine the performance of policy variants. they then engage in a long period of pure exploitation, assigning all of the users to the group that achieves the best performance. in practice, the pure exploratory phase can be expensive or even infeasible to implement. for example, in a health-management context, it is usually infeasible to arbitrarily assign individuals to a treatment plan. instead of two distinct periods of pure exploration and pure exploitation, a bandit-driven design adaptively combines exploration and exploitation. thus, it can reduce the opportunity cost incurred in the exploratory phase and help service providers to achieve better performance. . *** . *** . *** note: asterisk in superscript denotes that a benchmark model performs significantly worse than the proposed model. significance levels are: * p < . , ** p < . , *** p < . . in a typical healthcare context, individuals' behavior patterns likely change over time (johnson et al. ; king et al. ) . in this experiment, we investigate the recommendation results for the users whose choices tend to vary considerably as a means to examine whether our recommendation framework can well capture the dynamics in users' preferences. from the test-user set, we select the users whose challenge choices have the largest embedding variance. we implement our recommendation algorithm for the selected users and compare the recommendation performance with that of using the full test-user set. table presents the results for the new test-user set. our model is shown to outperform all of the benchmark models by all evaluation measurements. in particular, our model achieves an average challenge selection rate of . % when evaluated by doubly-robust estimation; . %, by offline precision; and . %, by simulation. to better show the performance variations, we plot the differences between the new recommendation results and the original results on the full test set in figure . here, we present the performance changes measured by doubly-robust estimation. the performance changes measured by offline precision and simulation are similar and are provided in appendix a . we find that bandit-driven models, such as our as challenge embeddings are vectors, we define the variance of embeddings as the minimum value of the variances of the elements. proposed model, ucb, and e -greedy, have different degrees of performance increase. in contrast, batchlearning-based models generally have a performance decrease. these results suggest that the advantage of the online-learning scheme is further strengthened when evaluated on the dynamic users whose preferences tend to vary frequently. the performance decrease of batch-learning-based models indicates that the "first learn, then earn" recommendation scheme may perform even worse when users' preferences exhibit strong dynamics. this may be because the dynamic patterns in users' preferences cannot be fully captured by the historical data. bandit-driven models, in contrast, are able to actively collect users' feedback on recommendations and, thus, can capture changes in users' preferences more promptly. . *** . *** . *** note: asterisk in superscript denotes that a benchmark model performs significantly worse than the proposed model. significance levels are: * p < . , ** p < . , *** p < . . performance variation for dynamic users . experiment : diversity analysis effectively learn users' diverse challenge preferences in the data. specifically, for our proposed model and each of the benchmark models, we calculate the recommendation frequency for each challenge type to construct a diversity distribution. we then compare the recommendation frequencies with users' challenge selection frequencies in the data, which reveal users' true preferences of challenge types. we use the jensen-shannon divergence (jsd) to measure the similarity between two diversity distributions (endres and schindelin ; fuglede and topsoe ) . a small jsd value indicates high similarity between two diversity distributions. in figure , we visualize the diversity distribution for each recommendation model and present the corresponding jsd values. the first bar in the figure represents the diversity distribution in users' challenge-selection histories observed in the data. the second bar represents the diversity distribution in the recommendations provided by our proposed model. as can be seen, the first two bars are very similar to each other, indicating that the recommendations produced by our model are diversified in a way that is similar to users' actual challenge-selection histories. the recommendation diversity distributions produced by the benchmark models generally have a larger difference from the observed challenge-selection data. for example, the bar that corresponds to cacf is quite different from the first bar. these observations are confirmed by our jsd results, that our proposed model has the smallest jsd value (i.e., . ) across all of the benchmark models. combined with the results in the earlier-discussed experiments, these findings provide evidence that our proposed recommendation framework can well support users' diverse healthcare preferences and that the diversity constraint can further guide the recommendation system to explore along each of the weight-management dimensions and, thus, improves learning efficiency. in this experiment, we aim to examine whether our recommendation framework can benefit more users. we define user improvement as the percentage of users who receive more preferred items from a focal recommendation algorithm than from a baseline algorithm. we use probabilistic matrix factorization (pmf) as our baseline algorithm. pmf models hidden user representations based on the users' challenge-selection histories. it does not, however, incorporate contextual information of users' selection behaviors, and it is batch-learning-based. thus, by comparing with pmf, we are able to assess the value of the recommendation context along with the online-learning scheme. we present our results for user improvement in figure . it is shown that our proposed recommendation approach has the highest user improvement rate (~ . %), suggesting that approximately % of users receive more preferred challenges from our recommendation framework than from pmf. comparatively, the other recommendation approaches have a lower user improvement rate (all below %). these results indicate that our recommendation framework can serve to improve a larger user population on the platform. the former experiments evaluate the effectiveness of our recommendation framework in improving users' challenge-selection rates. in the healthcare context, it is also important that service providers take further steps to evaluate corresponding health-related outcomes. although necessary, increasing users' engagement in interventions may not be enough, as it does not directly guarantee an improvement in health. therefore, in this experiment, we further examine our recommendation performance in improving users' weight-loss outcomes. in particular, we explore a different recommendation target: users' in-period weight-loss rate. that is, we use users' in-period weight-loss rate as feedback to guide the learning of our recommendation framework. different from prior experiments, generating weight-loss feedback requires two steps of simulation. first, we use the logistic predictor described in section . to simulate whether users will choose a particular item. second, we simulate users' in-period weight-loss status (e.g., weight gain or non-gain) based on their choices. we use users' weigh-in data to train a logistic predictor for this simulation, which takes user embeddings and the average challenge embeddings of users' choice as the input variables and users' weight-loss status as the prediction target. as shown in table , our proposed recommendation framework has the best performance under the new recommendation target, achieving an average in-period weight-loss rate of . %. in contrast, the benchmark models achieve a significantly worse average in-period weight-loss rate. these results indicate that our recommendation design is effective not only in helping users to find preferable challenges to engage in but also in further helping them to achieve better weight-loss performance. . *** pure exploitation and pure exploration pure exploitation . *** pure exploration . *** note: asterisk in superscript denotes that a benchmark model performs significantly worse than the proposed model. significance levels are: * p < . , ** p < . , *** p < . . in this study, we take a design-science perspective to develop a novel recommendation framework for providing personalized healthcare recommendations to users on online healthcare platforms. the design of our recommendation framework is motivated by several unique patterns in individuals' health behaviors. first, due to the evolving process of health management, users' health behaviors may continuously change with time. second, users may need multiplex healthcare information to manage different health aspects. these characteristics indicate that users' healthcare preferences can be dynamic and diverse. to this end, we propose a deep-learning and diversity-enhanced mab framework. mab is able to adaptively learn users' changing behavior patterns while promoting diversity along the exploration process. to better adapt an mab to the healthcare recommendation context, we further synthesize two model components into our framework based on prominent health-behavior theories. the first component is a deep-learning-based feature construction procedure, aimed at capturing important healthcare recommendation contexts, such as healthcare intervention's inherent attributes and individuals' evolving health condition and health-behavior paths. the second component is a diversity constraint, which we use to ensure that recommendations are provided in each of the major health-management dimensions, so that individuals can receive well-rounded support for their health management. we conduct a series of experiments to test each of our model components as well as to compare our model against state-of-the-art recommendation systems, using data collected from a representative online weight-loss platform. the results of the experiments provide strong evidence for the effectiveness of our proposed recommendation framework. our study contributes to the emerging literature on the application of business intelligence (abbasi et al. ; chen et al. ) . we demonstrate that prescriptive analytics can be integrated with it artifacts to generate applicable insights. in particular, the innovative healthcare recommendation framework that we developed provides an important contribution to the literature on recommendation systems and online healthcare systems. to the best of our knowledge, we are among the first to combine mab models with deep-learning-based embeddings to improve the characterization of recommendation contexts. in addition, the inclusion of diversity constraints demonstrates a way of promoting recommendation diversity according to pre-designed dimensions. this innovation can be of significance in professional industries, in which domain expertise needs to be incorporated to guide recommendation diversification. from a practical perspective, our recommendation framework can be used to address real-world challenges in healthcare recommendation problems. the effectiveness of our framework, as demonstrated by our results, implies great potential for using our recommendation design to provide users with tailored engagement suggestions. although our recommendation design is proposed for assisting individuals' engagement in online health management, the framework can be extended to broader problem settings. for example, the combination of an mab and deep-learning-based feature engineering can be used to solve other healthcare problems, such as drug discovery, disease diagnosis, clinical trials, and therapy development. decision making in such healthcare problems usually involves complex contextual knowledge (e.g., drug structures, patient health histories, symptom development paths), and decisionmakers usually do not have full knowledge of the environment (e.g., whether a drug is effective). the online-learning framework of an mab can help decision-makers to better cope with uncertainties in the healthcare environment, and deep-learning models can be combined to improve the characterization of decision-making contexts. in addition, we demonstrate a way to formulate recommendation constraints, which can be used to incorporate domain expertise to guide the recommendation procedure. our recommendation framework can also be extended to fields beyond healthcare. real-world decision-making problems, such as financial investment, product pricing, and marketing, usually contain different levels of uncertainty. the uncertainty may be because decision-makers do not gather enough data to guide their decision making, or the decision-making environment is frequently changing (e.g., market instability, technology change, policy environment fluctuation). thus, it is of practical importance to develop an adaptive decision-making framework that can respond well to the uncertainty in the environment. decision-makers may consider combining an mab with deep-learning embeddings to learn the contextdependency of their decision results while adaptively adjusting their strategies to minimize opportunity cost. in addition, in real-world recommendation problems, it is usually desirable to recommend diversified content to maximize the coverage of the information that users find interesting to improve their engagement experience. our formulation of the diversity constraint can be used to strengthen recommendation diversification along with theory-guided dimensions in multiple application areas. big data research in information systems: toward an inclusive research agenda toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions finite-time analysis of the multiarmed bandit problem matrix factorization techniques for context aware recommendation social cognitive theory of self-regulation health promotion from the perspective of social cognitive theory effective hierarchical vector-based news representation for personalized recommendation hybrid recommender systems: survey and experiments an empirical evaluation of thompson sampling context-aware collaborative filtering system: predicting the user's preference in the ubiquitous computing environment business intelligence and analytics: from big data to big impact wide & deep learning for recommender systems should i stay or should i go? how the human brain manages the trade-off between exploitation and exploration social support and patient adherence to medical treatment: a meta-analysis there'sa smart way to write management's goals and objectives doubly robust policy evaluation and learning a new metric for probability distributions blockbuster culture's next rise or fall: the impact of recommender systems on sales diversity the answer to weight loss is easy-doing it is hard! jensen-shannon divergence and hilbert space embedding bandit processes and dynamic allocation indices action centered contextual bandits maintenance of lost weight and long-term management of obesity stochastic satisficing account of confidence in uncertain value-based decisions recommendation systems: principles, methods and evaluation the association between weight loss and engagement with a web-based food and exercise diary in a commercial weight loss programme: a retrospective analysis understanding variation in chronic disease outcomes robust multiarmed bandit problems social support processes and the adaptation of individuals with chronic disabilities recommender systems: from algorithms to user experience internet-based weight control: the relationship between web features and weight loss making smart goals smarter a contextual-bandit approach to personalized news article recommendation diversity-promoting deep reinforcement learning for interactive recommendation a theory of goal setting & task performance visualizing data using t-sne unpacking the exploration-exploitation tradeoff: a synthesis of human and dynamic online pricing with incomplete information using multiarmed bandit experiments probabilistic matrix factorization reinforcement learning for bandit neural machine translation with simulated human feedback transforming physician practices to patient-centered medical homes: lessons from the national demonstration project why written objectives need to be really smart general concepts of goals and goal-setting in healthcare: a narrative review when more is less: the paradox of choice in search engine use the filter bubble: how the new personalized web is changing what we read and how we think empirical analysis of the impact of recommender systems on sales from virtual parties to ordering food, how americans are using the internet during covid- tracking multiple topics for finding interesting articles user-centric evaluation framework for recommender systems contextual combinatorial bandit and its application on diversified online recommendation promoting diversity in recommendation by entropy regularizer recommender systems: introduction and challenges. recommender systems handbook a hidden markov model for collaborative filtering risk-aversion in multi-armed bandits customer acquisition via display advertising using multi-armed bandit experiments social collaborative filtering for cold-start recommendations toward a theory of social support: closing conceptual gaps improving health by taking it personally uncertainty and exploration in a restless bandit problem ensemble contextual bandits for personalized recommendation analysis of recommender systems algorithms to stay or leave?: the relationship of emotional and informational support to commitment in online health support groups obesity and overweight good intentions, bad outcomes: the effects of mismatches between social support and health outcomes in an online weight loss community feeling blue? go online: an empirical study of social support among patients online context-aware recommendation with time varying multi-armed bandit deep learning based recommender system: a survey and new perspectives key: cord- -eqn kl p authors: drissi, nidal; ouhbi, sofia; janati idrissi, mohammed abdou; ghogho, mounir title: an analysis on self-management and treatment-related functionality and characteristics of highly rated anxiety apps date: - - journal: int j med inform doi: . /j.ijmedinf. . sha: doc_id: cord_uid: eqn kl p background and objective: anxiety is a common emotion that people often feel in certain situations. but when the feeling of anxiety is persistent and interferes with a person's day to day life then this may likely be an anxiety disorder. anxiety disorders are a common issue worldwide and can fall under general anxiety, panic attacks, and social anxiety among others. they can be disabling and can impact all aspects of an individual's life, including work, education, and personal relationships. it is important that people with anxiety receive appropriate care, which in some cases may prove difficult due to mental health care delivery barriers such as cost, stigma, or distance from mental health services. a potential solution to this could be mobile mental health applications. these can serve as effective and promising tools to assist in the management of anxiety and to overcome some of the aforementioned barriers. the objective of this study is to provide an analysis of treatment and management-related functionality and characteristics of high-rated mobile applications (apps) for anxiety, which are available for android and ios systems. method: a broad search was performed in the google play store and app store following the preferred reporting items for systematic reviews and meta-analysis (prisma) protocol to identify existing apps for anxiety. a set of free and highly rated apps for anxiety were identified and the selected apps were then installed and analyzed according to a predefined data extraction strategy. results: a total of anxiety apps were selected ( android apps and ios apps). besides anxiety, the selected apps addressed several health issues including stress, depression, sleep issues, and eating disorders. the apps adopted various treatment and management approaches such as meditation, breathing exercises, mindfulness and cognitive behavioral therapy. results also showed that % of the selected apps used various gamification features to motivate users to keep using them, % provided social features including chat, communication with others and links to sources of help; % offered offline availability; and only % reported involvement of mental health professionals in their design. conclusions: anxiety apps incorporate various mental health care management methods and approaches. apps can serve as promising tools to assist large numbers of people suffering from general anxiety or from anxiety disorders, anytime, anywhere, and particularly in the current covid- pandemic. education, and relationships [ , ] . the exact causes of anxiety disorders are still unknown. according to the national institute of mental health, it is likely to be a combination of genetic and environmental factors [ ] . other possible factors that can lead to susceptibility include brain chemistry, personality type, exposure to certain mental and/or physical disorders, trauma and stress [ ] . the covid- outbreak, in addition to being a public health emergency, is also affecting mental health in individuals on a global scale causing people to suffer from stress, anxiety, and depression [ , ] . the pandemic is also triggering feelings of fear, worry, sadness, and anger [ , ] . quarantines, self-isolation, fear of the unknown, loss of freedom and other factors are causing psychological issues in people around the world [ , ] . these situations and circumstances can trigger several anxiety disorders, mainly separation anxiety disorder which is defined as fear of being away from home or loved ones, illness anxiety disorder which is defined as anxiety about a person's health (formerly called hypochondria) [ ] and panic attacks that are affecting a large number of people because of excessive worrying. psychiatric patients are additionally at a higher risk of experiencing symptoms related to psychological issues caused by the pandemic [ ] . people with preexisting anxiety disorders are showing aggravation of their conditions, for example, many people with ocd are developing new fixations on the covid- virus and are experiencing compulsive cleaning [ ] . due to the covid- pandemic, social interactions have significantly decreased in several parts of the world. while this may have provided relief to some people with social anxiety, it is possible this lack of interaction may have negative consequences in the longer term [ ] . returning to work after a period of lockdown, while still in the state of pandemic, is also causing the workforce to exhibit symptoms related to ptsd, stress, anxiety, depression and insomnia [ ] . the current covid- situation is also affecting the mental well-being of health care workers, who are at a high risk of psychological distress [ ] , especially those who are experiencing physical symptoms [ ] . the situation is further worsened by the recommended avoidance of inperson contact and fear of infection, as people with anxiety and other mental disorders might not be able to consult with a mental health professional. there are various barriers to mental health care delivery, such as cost, stigma, lack of mental health care professionals, and distance from health care services [ , ] . mobile mental health or m-mental health, which uses mobile technologies for providing mental health services, has the potential to help overcome mental health care delivery barriers, as it provides anonymous access to care, low to no cost care, and remote communication. smartphones can be a convenient tool to reach a large number of people from different parts of the world. there are many mobile applications (apps) for mental health problems such as ptsd [ , ] , stress [ ] , depression [ ] and alcohol dependence [ ] , as well as other health issues such as obesity, that apps can help with, especially due to lack of exercise during circumstances similar to the current lockdown [ ] . smartphone apps have high rates of acceptance among the general public, and especially in young people [ ] due to its cost effectiveness [ ] . many studies have reported that apps have shown positive results in the treatment and management of anxiety [ , , ] . this study aims to analyze the functionality and characteristics of highly j o u r n a l p r e -p r o o f rated anxiety apps to identify users' preferred features and management methods delivered for anxiety with a smartphone or a tablet. for the purposes of this study, only free apps were selected, as recent statistics in march showed that . % of android apps and . % of ios apps were freely available worldwide [ ] . a total of apps, android apps, and ios apps were selected. the anxiety management approaches used in these apps among other aspects of functionality have been extracted and analyzed. this section presents the methodology that was followed in order to select and analyze android and ios anxiety apps. this paper follows the quality reporting guidelines set out by the preferred reporting items for systematic reviews and meta-analysis (prisma) group to ensure clarity and transparency of reporting [ ] . google play repository and the app store were used as sources to select anxiety apps. both app repositories are very popular with a high number of available health care apps: more than , apps are available in the google play store, and more than , apps are available in the app store [ ] . a general search string, composed of only one word "anxiety", was used. it was automatically applied to the titles and descriptions of android and ios apps. j o u r n a l p r e -p r o o f each app from the search result was examined by the first author to decide whether or not to be included in the final selection. the second author revised the final apps selection. the following inclusion criteria (ic) were applied: • ic : anxiety related apps in google play store and app store. • ic : apps that have a free version. • ic : apps that have + stars rating. ic reflects a level of user satisfaction with the app. the focus is on highly rated anxiety apps so as to discover the functionality features and characteristics that provide high user satisfaction. the following exclusion criteria (ec) were applied to the candidate apps to identify the final selection that would be included in this study: • ec : apps that have less than raters. • ec : apps that could not be installed. • ec : apps that crashed and could not be used after installation. apps that match any of the ec were excluded from the selection. ec is based on the heuristic guideline by nielsen [ ] , which recommends having five evaluators to form an idea about the problems related to usability. the apps' selection process was established as follows: . the search string was used to identify candidate apps in the google play store and app store in order to create a broad selection from which to choose from. . ic were used to identify relevant apps. . apps that met one or more of the ec were excluded. the above actions were carried out in march . a final selection of android apps and ios apps was identified after application of ic and ec. fig. presents the selection results. data collection was carried out using the data extraction form presented in table . each app was installed and assessed to explore its functionality features and characteristics. the devices used for the apps' assessment were: oppo a (android ), and ipad (ios ). a template was designed in an excel file to provide basic information about the apps as well as specifying their main features and functionality characteristics. some of these characteristics and functionality features were retrieved from the app's description available in the app repository. this section presents and discusses the results of this study. a total of apps, android apps, and ios apps were identified as both free and highly rated apps. tables a. , a. , a. , a. , a. and a. in appendix present general information about the apps such as name, link, rating, number of raters, number of installations (not available for ios apps), and date of latest update. the majority of the selected apps ( %) offer in-app purchases for paid features and functionality. these apps are free to download and use, but many of their proposed functionality features are not available without purchase. thus, it can be said that users may not fully benefit from the app unless they purchase these specific features. however, it should also be noted that in-app purchases are a way for many developers to monetize their work j o u r n a l p r e -p r o o f apps general information: -name of the app. -date of the latest update. -users rating (scored out of ): to report the level of user satisfaction from the apps. -number of raters: to report the number of raters satisfied with the app. -number of installations (not available for ios apps): to identify the most installed apps. -in-app purchase: to identify whether free apps charge users for certain functionality features. -management method: to identify management and treatment methods for anxiety that could be delivered through an app, and the most used ones in the available apps. -intervention approach: to identify approaches that could be transmitted through an app, and the most followed approaches in the available apps. -targeted mental problem/symptoms: to identify anxiety related issues addressed by the apps and issues that might be managed with similar management methods and approaches as ones for anxiety, as well as to identify problems that could be treated and managed through apps. -involvement of mental health care professional: this information was extracted from apps' descriptions in-app repositories and from apps' content. we consider mental health care professionals to be those professionals with a mental health background including psychiatrists, therapists, counselors and experts in psychological issues or management methods. -physical health information such as hr and bp: to identify whether the app relies on physical indicators to assess the mental status of the user. -authentication method: to identify if the app provides users with the option to keep their personal health data inaccessible to other users of the same device. -gamification features: to identify whether gamification features are included in the app to encourage and motivate the users to keep using it. -social features which might include: links to communities, associations, and centers; interoperability with other apps or websites; the possibility to share content via social networks (sn); and contact information in case of emergencies. -languages: identify the availability of the apps in multiple languages, which reflects the degree of internationalization of the app. -offline availability: identify whether the app can be used without internet access. [ ] . the free version of the app is used by many developers as an advertisement tool to attract users into purchasing and unlocking more features [ ] . free apps with in-app options are becoming the norm in-app markets. in , in-app purchases accounted for more than % of ios app revenue in the us and % of revenue in asia [ ] . the majority of the selected apps ( %) updated their functionality and content in the three first months of . this could be linked to the current covid- pandemic situation. on december st, the who china office was informed of a number of pneumonia cases from an unknown cause, that were later linked to the coronavirus [ ] , which has now spread to all regions of the world [ ] . to limit the spread and risk of the virus, the who advised the public to practice social distancing and to stay home [ ] . many countries have declared obligatory lockdowns and people were quarantined, which has created a state of fear and worry that has elevated many individuals' anxiety and stress. various existing anxiety apps have, thus, been updated to include covid- related content. table presents various management methods identified in the selected anxiety apps with meditation and breathing exercises being the most common. the main goal of meditation is to help the user enter a deep state of relaxation or a state of restful alertness. it helps to reduce worrying thoughts, which play a key role in symptoms of anxiety, and bring about a feeling of balance, calmness, and focus [ ] . several studies have presented evidence supporting the use of meditation in anxiety treatments [ , , ] . one study reported that it was beneficial for a group of chinese nursing students j o u r n a l p r e -p r o o f meditation a , a , a , a , a , a , a -a , a , a , a , a -a , a , a , a , a , a , a , a , a , a , a , a , a , a , a , a , a , a , i , i , i -i , i , i , i , i , i , i , i , i , i , i , i , i breathing exercises a , a , a , a , a , a , a , a a , a , a , , i i i i games a , a , a , a , a , a , a , a , a , a , a , a , a , a , a , i , i , i , i , i -i , i , i , i , i , i -i , i , i , i assessment tests a , a , a , a , a , a , a , a , a , a , a , a , a , a , a , i , i stories a , a , a , a , a , a , a , a , a , i , i , i , i , i , i , i mindfulness practices a , a , a , a , a , a , a , a , i , i , i , i , i , i , i , i guided relaxation a , a , a , a , a , a , a , a , a , a , i , i , i , i community chats with app users via the app a , a , a , a , a , a , a , a , a , a ,i yoga and physical exercises a , a , a , a , a , a , a , a , a , a , a , a , a , a , a , a , i motivational and inspirational statements a , a , a , a , a , a , a , a , a ,i , i , i online therapy and coaching a , a , a , a , a , a , a , a i , i recommending activities and tips a , a , a , a , a , a , a , a ,i interactive messaging a , a , a , a in reducing anxiety symptoms and lowering systolic bp [ ] . another study reported that it showed improvements in the reduction of anxiety for breast cancer patients [ ] . a meta-analysis of controlled trials for the use of meditation for anxiety also reported a level of efficacy of meditative therapies in reducing anxiety symptoms [ ] . additionally, meditation has been shown to be effective in managing various types of anxiety such as panic disorder and agoraphobia [ ] . breathing exercises are another mechanism that can help to relax and relieve stress. while practicing deep breathing, a message is sent to the brain to calm down and relax. biochemical changes subsequently decrease hr and bp and help the person to relax [ ] . studies have shown that breathing exercises can improve cognition and overall well-being [ ] , while also reducing anxiety [ , , , , ] . breathing exercises can also have a positive impact on psychological distress, quality of sleep [ ] , depression [ , , ] , everyday stress, ptsd, and stress-related medical illnesses [ , ] . breathing exercises are also used to help with asthma, which was the case in a and a . however, it should be noted that such exercises may help patients whose quality of life is impaired by asthma, but they are unlikely to reduce the need for anti-inflammatory medication [ ] . many of the selected apps provided educational content about anxiety and other mental issues, symptoms, and management methods, either in the form of courses, articles, videos, or others. educating users about anxiety can help to reassure them and provide them with the necessary knowledge by answering questions and correcting misinformation that they might have. educating users about the provided management method and its benefits may also increase their trust in the management approach and their willingness to try it. mental assessment tests have been provided by some apps to give the user an idea about his/her mental status, anxiety, stress and/or depression levels. relaxing music and sounds, is a noninvasive and free of side-effects ap-proach that has been used in apps as a management method. it has been shown to be an effective tool for the reduction of anxiety, stress, and depression [ , ] . it has also shown positive results in the prevention of anxiety and stress-induced changes like hr and bp [ ] . developers should take into account the type of music and sounds used, as well as the accompanying environment, as they both affect the effectiveness of this method [ , ] . thirty-one apps provided journaling and writing diaries to help users plan their day, track their mood, and express their thoughts, feelings, and emotions. securing the privacy and confidentiality of users' information is critical in such apps. all selected ios apps providing journaling provide authentication methods, while only % of android apps with this functionality provide users with the same level of authentication. eleven apps provide the user with the possibility of communicating with other users. in these apps, users are able to share their experiences, talk about their issues, help each other, and relate to others who are undergoing similar problems as their own. in the current covid- pandemic, being in a state of isolation but having the ability to connect with an online community can be very helpful. the idea of enabling interaction with a community of people with similar issues is quite interesting and can be extremely helpful, especially given that people with anxiety often tend to avoid direct communication [ ] . for users who prefer communication with mental health care professionals, there are ten apps available that provide online therapy and coaching, enabling users to communicate with mental health care professionals, without having to travel, while also avoiding obstacles like stigma and distance. selected apps offering online therapy services charge fees for these services. these apps also provide information on the mental health care professionals' credentials. this information is important as it allows the user to check whether these professionals are appropriately accredited and decide which mental health care professional is most suited for his/her needs. thirty-three apps provide users with games like coloring books, puzzles, and slime simulations, as management methods for anxiety. these games help the user to relax, and to take his/her mind off worrying thoughts or feelings. games are usually enjoyable and entertaining and this may motivate users to continue using these apps. the variety of management methods identified in the selected apps points to the high potential of apps usage for coping with anxiety. developers have integrated various promising and effective management methods in their apps' functionality features. users can access these features at any time and in any place. this could be beneficial for users with anxiety disorders, especially in situations where immediate help is needed (e.g., during panic attacks), or in cases where mental health care professional cannot be reached due to circumstances like distance or the current global lockdown situation. table presents the selected apps which state the use of specific intervention approaches for anxiety management. the most used ones included mindfulness, cognitive behavioral therapy (cbt), and hypnosis. mindfulness was the most adopted management approach. it is defined as "bringing one's complete attention to the present experience on a moment-tomoment basis" [ ] . mindfulness practices allow practitioners to shift their concentration to their internal experiences occurring in each moment, such as anxiety and mood problems [ , ] , and improving an individual's internal cognitive, emotional, and physical experience [ ] . some findings suggest that mindfulness can be more complicated than it might seem, as many el-ements like attention emotional balance, differences in emotion-responding variables, and clinical context can influence its effect [ , , ] . therefore, these elements should be taken into account while developing mindfulnessbased anxiety apps. cbt is a form of psychological treatment, mainly based on efforts to change thinking patterns [ ] . many studies have supported the effectiveness of cbt-based interventions for the treatment of anxiety, and have reported on the long-term positive effect it has on both children and adults [ , ] . a study examining available evidence on cbt have yielded positive results and confirmed its effectiveness for anxiety disorders [ ] . cbt has also been used in the treatment of some specific anxiety disorders like ptsd [ ] and ocd [ ] . it has also been proved effective for depression, alcohol and drug use problems, eating disorders, and severe mental illness [ ] . cbt and mindfulness-based therapy can also be useful in reducing anxiety during the covid- pandemic [ ] . hypnosis is a therapeutic technique designed to bring relaxation and focus to the mind [ ] . many studies have reported the effectiveness of hypnosis for the treatment of anxiety. one study stated that it can reduce anxiety among palliative care patients with cancer [ ] , and another reported on its considerable benefits to terminally ill patients [ ] . hypnosis is also used to treat and manage stress and phobias [ ] , as well as sleep and physical symptoms [ ] . other approaches have also been identified in the selected apps as shown in table , but it should be noted that a few of them were not based on scientific approaches. table presents the different health issues besides anxiety that were addressed by the selected apps. all selected apps addressed general anxiety. some apps addressed specific types of anxiety like social anxiety, separation anxiety, performance anxiety, ocd, ptsd, and panic attacks. focus and concentration a , a , a , a , a , a , a , i , i , i self-esteem and confidence a , a , a , a , a , a , a , i , i , i , i , i pain a , a , a , a , a , a , a , i , i mood a , a , a , a , a , a , a , a , a some apps addressed other mental and physical issues, which usually occur with anxiety like stress [ , ] , sleep issues [ ] , and depression [ , , ] . some apps used management methods to treat addiction-related issues, eating disorders [ ] , phobias, [ ] , and asthma [ ] . the majority of the apps do not use physical health information. hr and bp are impacted by anxiety and stress [ ] . both can be used by apps to indicate the anxiety level of the user [ ] . yet in our selection only two apps provided this functionality feature (a and a ). a collects data on hr variability, using the photoplethysmogram (ppg) technique to get insights on the user's health, including stress, energy, and productivity levels. the app also allows the user to manually enter bp as a convenient way of journaling. it should be noted that a provides cardiovascular tests, including hr and peripheral blood circulation, as an app purchase option. only % of the selected apps reported involvement of mental health care professionals as presented in table . apps providing online therapy specified information about the therapists that the user can contact. this information includes their specialty, experience, and diplomas. some apps shown in table provided names of the professionals involved in their co-creation. providing names gives the user the possibility to look online for the credentials of the involved professionals and might increase the user's trust toward these apps. we cross-checked the names displayed in table and found them to be legitimate. table table presents the authentication methods identified in the selected apps. the majority of the selected apps ( %) do not require authentication. the absence of authentication might give the user a sense of anonymity. however, authentication can help the user ensure the privacy of his/her data. the app a requests a nickname and a password, ensuring security and confidentiality as well as keeping the anonymity of the user, since it does not use any information or sources that could reveal the identity of the user like facebook account, google account, or email. nickname and password a gamification is the use of game elements in non-gaming systems which are mainly used to improve user experience and user engagement [ ] . table presents the different gamification methods identified in the selected apps. note that some apps use more than one gamification method. the majority of the selected apps used gamification features to encourage and motivate the user. creating a fun, interactive user experience with the adoption of game elements can create an enjoyable user experience, which can further reduce boredom and motivate users keep using the app. this can also increase user engagement, leading to users providing more accurate information about their mental health status and to increased benefit for the user from the provided mental health care management method. gamification is a widely used approach that has shown effectiveness with anxiety and other mental health problems, such as depression and ptsd for military personnel [ , ] , and aggression for veterans [ ] . combining j o u r n a l p r e -p r o o f game a , a , a , a , a , a , a , a , a , a , a , a , a , a , a , a , i , i , i , i -i , i , i , i , i , i -i , i , i , i graphics a , a , a , a , a , a , a , a , a , a , a , a , a unlocking new features a , a , i , i , i score and points a , a , a , i stickers, awards and stars a , a , a , a , i game elements and knowledge on game players' behaviors with known mental health care management methods is an interesting approach that can result in the creation of effective anxiety apps. table presents the different social features provided by the selected apps. many apps provide social and communication features, which allow the user to connect with communities of app users as well as with centers and associations, or with others to share content and progress. those social features could prove to be beneficial to the user. for instance, sharing progress and content from the app via social networks (sn) and emails helps provide social support to the user from family and friends. social support is significantly associated with well-being and absence of psychological distress [ ] . it has a favorable effect on certain psychological issues [ ] , and can serve as a mediator to stress and anxiety caused by life events [ ] . providing social support is also among the behavioral change techniques implemented in m-health apps to promote app usage [ ] . additionally, providing contacts in case of emergencies is crucial and might help the user in critical situations j o u r n a l p r e -p r o o f where he/she feels the need for immediate help. links to associations, websites, and centers can provide the user with more helpful resources. social features are very important as they help the user connect with others in a beneficial way. emergency contacts' information a , a , a , a , a , a , a , a , a group treatment i , i table presents the languages available in the selected apps. the majority of the apps ( app) are available only in english, which can be explained by the fact that the search string applied in app repositories was in english. only one app (a ) automatically translates its content to the device's preferred language. while the rest of the apps are available in more than one language. availability in multiple languages can help reach a larger number of users. i , i -i , i , i , i , i , i , i , i , i -i , i -i , i , i , i -i more than one language a , a , a , a , a , a -a , a , a , a , a , a , a -a , a , a , a , a , a , a , a , i , i , i , i , i , i -i , i , i , i , i , i , i system's languages a j o u r n a l p r e -p r o o f table shows whether an app requires internet access to function or not. internet access is required to install and create accounts for all apps, but once that is done, many apps function without internet access. offline availability is an aspect that will help users benefit from the app without necessarily being in a setting with internet access. this will decrease the app's limitations and make it more accessible to users. however, some of the management methods identified do require internet access, like online therapy and communication with communities of app users. additionally, offline availability may require downloading more data that could be permanently stored, which may affect a phone's memory and performance. some apps were only partially available offline, resulting in limited functionality when internet access was not available. other apps only made downloaded data available offline, meaning the user chooses and downloads content that he/she wants to be available while offline. these are convenient solutions to offline availability that do not compromise on app functionality. this study is subject to limitations, such as: (i) missing terms (e.g., stress, depression) in the search string that might have resulted in the selection of relevant apps, as usually an app targets more than one mental health issue. however, the search string used identified any app that mentions anxiety in its title and/or description, therefore this can alleviate the threat of missing relevant apps; and (ii) the first author conducted the search and applied the ec and ic to the initial selection. however, the final selection has been reviewed by the second author. with the current development in mobile communication and the wide ownership of mobile devices, m-mental health seems to be one of the most promising ways to deliver care to people in need regardless of their situation. under certain circumstances like the current covid- pandemic, the use of mobile communication and apps for anxiety might become a necessity. panic attacks can mimic covid- symptoms, which might worsen the condition of people with anxiety disorders [ ] . having an app on hand that can ease anxiety in such circumstances is useful. this study highlights the functionality and characteristics of anxiety apps that are well rated by users. we plan to build on the reported findings to develop a reusable requirements catalog for anxiety apps. mental health care professionals and people with anxiety disorders will be involved in the co-creation of this catalog. the catalog will also include software quality requirements based on the iso/iec standard and recommendations from the uk national health service (nhs) and the health insurance portability and accountability act (hipaa) on health apps. since the reusable requirements catalog for anxiety apps will be based on functionality of existing highly rated apps, as well-being based on inputs from mental health care professionals and people suffering from anxiety, it could be used to assist developers to select relevant requirements for anxiety apps. apps could therefore be designed based on the catalog to assist people dealing with anxiety. requirements from the catalog could also be used to generate checklists for audit and evaluation purposes [ ] , either to evaluate apps or to compare their functionality and characteristics. the findings from this study may also assist researchers and developers interested in the field of m-mental health, especially in the sub-field of anxiety, to have an overview of the characteristics and functionality of existing highly rated apps for anxiety. our findings could also assist mental health professionals to find anxiety apps that could be integrated in their mental health care process, as well as assist people suffering from anxiety to find mobile apps best suited for their needs. during the covid- pandemic, mhealth can also help disseminate health information among health personnel and community workers [ ] . all authors contributed to the creation of the manuscript. nd: design, conception, acquisition and interpretation of data, classification of selected apps, drafting of the manuscript, revision. so: design, conception, statisti- j o u r n a l p r e -p r o o f what was already known on the topic: -anxiety disorders are a common mental issue. -there are many barriers to mental health care delivery, mainly cost, stigma and distance from health professionals. -apps were found to be effective tools to deliver mental health care, and overcome the aforementioned barriers. what this study added to our knowledge: - free and high-rated anxiety apps were analysed: android apps, and ios apps. -anxiety apps addressed other health issues, such as: stress, depression, sleep issues, and eating disorders. -anxiety apps adopted various management, treatment and coping approaches such as, meditation, breathing exercises, mindfulness and cognitive behavioral therapy. cal support, interpretation of data, drafting of the manuscript, critical revision. maji and mg: critical revision. all authors read and approved this manuscript. the authors have no conflict of interest. this article does not contain any studies with human participants or animals. j o u r n a l p r e -p r o o f what to know about anxiety everything you need to know about anxiety share of the population worldwide who suffered from anxiety disorders from what are anxiety disorders? anxiety causes immediate psychological responses and associated factors during the initial stage of the coronavirus disease (covid- ) epidemic among the general population in china a longitudinal study on the mental health of general population during the covid- epidemic in china anxiety on rise due to coronavirus, say mental health charities mental health considerations during covid- outbreak the psychological impact of quarantine and how to reduce it: rapid review of the evidence how to manage stress and anxiety from coronavirus (covid- do psychiatric patients experience more psychiatric symptoms during covid- pandemic and lockdown? a case-control study with service and research implications for immunopsychiatry how the covid- pandemic affects people with social anxiety is returning to work during the covid- pandemic stressful? a study on immediate mental health status and psychoneuroimmunity prevention measures of chinese workforce a multinational, multicentre study on the psychological outcomes and associated physical symptoms amongst healthcare workers during covid- outbreak barriers to mental health care: perceived delivery system differences barriers to mental health treatment: results from the who world mental health surveys mobile apps for post traumatic stress disorder gamification-based apps for ptsd: an analysis of functionality and characteristics gamification in stress management apps: a critical app review a systematic review of cognitive behavioral therapy and behavioral activation apps for depression the alcohol tracker application: an initial evaluation of user preferences online and smartphone based cognitive behavioral therapy for bariatric surgery patients: initial pilot study receptiveness and preferences of health-related smartphone applications among vietnamese youth and young adults enabling psychiatrists to be mobile phone app developers: insights into app development methodologies adding a smartphone app to internet-based self-help for social anxiety: a randomized controlled trial effectiveness of an app for reducing preoperative anxiety in children: a randomized clinical trial adoption of mobile apps for depression and f anxiety: cross-sectional survey study on patient interest and barriers to engagement statista. free and paid app distribution for android and ios the prisma statement for reporting systematic reviews and metaanalyses of studies that evaluate health care interventions: explanation and elaboration number of mhealth apps available in the apple app store from st quarter to rd quarter how to conduct a heuristic evaluation effect of perceived value and social influences on mobile app stickiness and in-app purchase intention how the most successful apps monetize their user bade novel coronavirus ( -ncov) situation report- coronavirus disease (covid- ) situation report- coronavirus disease (covid- ) advice for the public effectiveness of a meditation-based stress reduction program in the treatment of anxiety disorders effects of meditation on psychological and physiological measures of anxiety meditation therapy for anxiety disorders a randomized controlled trial of the effects of brief mindfulness meditation on anxiety symptoms and systolic blood pressure in chinese nursing students effects of meditation on anxiety, depression, fatigue, and quality of life of women undergoing radiation therapy for breast cancer meditative therapies for reducing anxiety: a systematic review and meta-analysis of randomized controlled trials stress management: breathing exercises for relaxation effect of short-term practice of pranayamic breathing exercises on cognition, anxiety, general well being and heart rate variability impact of jacobson progressive muscle relaxation (jpmr) and deep breathing exercises on anxiety, psychological distress and quality of sleep of hospitalized older adults sudarshan kriya yogic breathing in the treatment of stress, anxiety, and depression: part i-neurophysiologic model effects of a relaxation breathing exercise on anxiety, depression, and leukocyte in hemopoietic stem cell transplantation patients effectiveness of controlled breathing techniques on anxiety and depression in hospitalized patients with copd: a randomized clinical trial self-regulation of breathing as a primary treatment for anxiety breathing exercises for asthma: a randomised controlled trial effect of music on anxiety, stress, and depression levels in patients undergoing coronary angiography buela-casal, acute stress recovery through listening to melomics relaxing music: a randomized controlled trial relaxing music prevents stress-induced increases in subjective anxiety, systolic blood pressure, and heart rate in healthy males and females effects of relaxing music on cardiac autonomic balance and anxiety after acute myocardial infarction relaxing music for anxiety control bridging differences: effective intergroup communication mindfulness and meditation wherever you go, there you are: mindfulness meditation in everyday life skills training manual for treating borderline personality disorder mindfulness training as a clinical intervention: a conceptual and empirical review the effect of mindfulness-based therapy on anxiety and depression: a meta-analytic review mindfulness meditation, anxiety reduction, and heart disease: a pilot study mindfulness and anxiety disorders: developing a wise relationship with the inner experience of fear how does mindfulness reduce anxiety, depression, and stress? an exploratory examination of change processes in wait-list controlled mindfulness meditation training affective reactivity mediates an inverse relation between mindfulness and anxiety mindfulness and emotion regulation in depression and anxiety: common and distinct mechanisms of action what is cognitive behavioral therapy? long-term effectiveness of cbt for anxiety disorders in an adult outpatient clinic sample: a follow-up study cognitive behavioral therapy in anxiety disorders: current state of the evidence treating the trauma of rape: cognitivebehavioral therapy for ptsd cognitive-behavioral therapy for ocd mental health strategies to combat the psychological impact of coronavirus disease (covid- ) beyond paranoia and panic hypnosis and relaxation therapies a hypnotherapy intervention for the treatment of anxiety in patients with cancer receiving palliative care use of hypnotherapy in anxiety management in the terminally ill: a preliminary study the clinical use of hypnosis in cognitive behavior therapy: a practitioner's casebook gamification. using game-design elements in non-gaming contexts a virtual reality exposure therapy application for iraq war military personnel with post traumatic stress disorder: from training to toy to treatment cognitive processing therapy for veterans with military-related posttraumatic stress disorder virtual reality and cognitivebehavioral therapy for driving anxiety and aggression in veterans: a pilot study social support and mental health in community samples effects of social support and personal coping resources on depressive symptoms: different for various chronic diseases? social network mediation of anxiety behavior change techniques in top-ranked mobile apps for physical activity popular science. a panic attack can mimic the symptoms of covid- . here's what to do about it e-health internationalization requirements for audit purposes coverage of health information by different sources in communities: implication for covid- epidemic response deep key: cord- -ryyokrdx authors: baron, lauren; cohn, brian; barmaki, roghayeh title: when virtual therapy and art meet: a case study of creative drawing game in virtual environments date: - - journal: nan doi: nan sha: doc_id: cord_uid: ryyokrdx there have been a resurge lately on virtual therapy and other virtual- and tele-medicine services due to the new normal of practicing 'shelter at home'. in this paper, we propose a creative drawing game for virtual therapy and investigate user's comfort and movement freedom in a pilot study. in a mixed-design study, healthy participants (n= , females) completed one of the easy or hard trajectories of the virtual therapy game in standing and seated arrangements using a virtual-reality headset. the results from participants' movement accuracy, task completion time, and usability questionnaires indicate that participants had significant performance differences on two levels of the game based on its difficulty (between-subjects factor), but no difference in seated and standing configurations (within-subjects factor). also, the hard mode was more favorable among participants. this work offers implications on virtual reality and d-interactive systems, with specific contributions to virtual therapy, and serious games for healthcare applications. virtual reality (vr) is a computer-generated simulation of a d environment that users can immerse themselves into and interact with via hardware (headset, controllers, joystick, treadmill, etc.). given that most people in the world are experiencing stressful life changes under the covid- pandemic crisis, we investigate how to integrate vr into at-home therapy. vr has been successfully used within rehabilitation settings for motor learning, impaired cognition, obesity, and overall health and wellness [ ] . in this paper, we introduce a creative drawing game for virtual therapy and investigate user's comfort, range of motion and movement in multiple scenarios and configurations in a pilot study. this game allows the user to be fully engaged in both the physical stimulation and the mental stimulation. figure demonstrates an overview of the game with a user performing one of the therapeutic tasks while standing. the game encourages broad arm motions while still being entertaining as the user strives to connect the dots of the drawing. a creative drawing game modelled like connect the dots allows for familiarity and ease while playing -users are not overwhelmed with a game that feels foreign to them. the working hypothesis of this study was that our creative drawing vr game would be effective when integrated into therapy by analyzing improved task completion time (tct), accuracy based on lower number of the mistakes, and user experience (ux). more specifically, research questions to inspire the study were: does the vr therapy game improve user's range of motion and reach? is there any difference between the complexities of the easy/hard levels in the game based on tct and accuracy? how about and differences on tct and accuracy based on game configurations of seated and standing? do our users enjoy playing the game and recommend it to their peers? if so, which configurations are the most popular ones? figure : overview of a user playing the creative vr therapy game. though vr started out as a form of entertainment, it has grown to have implications in the medical field, from simulating surgery for surgeons [ ] to attenuating patient pain during chemotherapy [ ] . one of the industries vr is advancing is physical therapy. vr is preferred in rehabilitation because of its portability so that patients can take the therapy home, its ability to attenuate pain, its independence from external pressures and distractions, and its game-like characteristics that engage users. for example, vr was proven to be better at reducing phantom limb pain than other distraction methods [ ] . because the patients used vr exercise imagery, it stimulated the same brain regions that are responsible for actual movement. therefore, pain was reduced due to pain distraction and punctually activated brain regions involved in the pain matrix network [ ] . another example is how stroke patients report physical pain and an inability to concentrate during their rehabilitation without vr [ , ] . not only does vr make therapy movements not as painful, but it also helps patients regain movement that was lost (i.e. after a stroke) and extend their range of movement. by transforming rehabilitation into an entertaining game, the intense, repetitive, task-oriented arm exercises become more engaging and provide a more positive experience for both the patient and their therapist [ ] . vr improves movement range and pain in the upper extremities (ue). our paper looks at the difference between seated and standing vr regarding the dynamics of movement and comfort in a creative vr therapy game. the pros and cons of seated vr and standing vr have been studied closely, yet still stimulate further research and discussion [ ] . there are many reasons why people chose seated vr over standing vr and vice versa. some users prefer to be seated at a desk because they can have an interactive surface (desk) to perform their task on [ ] ; it is more comfortable and less prone to fatigue to be seated than walking around during long durations [ ] ; it is more suitable for those with a sedentary lifestyle or mobility-impairments [ ] ; it reduces the risk of injuries due to falls, motion sickness, or hitting other objects [ ] ; and it makes them users feel less vulnerable and more acceptable to use vr [ ] . however, standing vr gives users better range for full-body gestures, better performance, and better interactions and locomotion within the d environment [ ] . in addition, developers lean towards seated vr because hand/object tracking can be easier when the user's overall movements are restricted in space [ ] ; many leading vr products appear to be designed for seated configuration [ ] ; it is more suitable for small or cluttered spaces [ ] ; and it is less likely for users to be entangled in the chords/cables [ ] . there are many numerous hardware and layout configurations for vr, and there is still a lot of progress to be made towards best practices of user comfort, and movement assessment of users in seated vr, particularly in virtual therapy. we aim to evaluate what configuration is best for virtual therapy for ue mobility by measuring accuracy and tct, visualizing their movements through data tracking, and evaluating their responses to the exit survey. investigating the best practices for vr therapy is in high demand. the coronavirus (covid- ) pandemic is a global health emergency currently involving countries with > , , infections confirmed and > , deaths worldwide [ ] . however, covid- has affected nearly every person mentally and emotionally. the impact on mental health concerns not only medical staff, who are working nonstop in a highstress and high-risk health environment, but also millions of people forced into isolation/quarantine [ , ] . availability of physical therapy services in the community-even for urgent concerns-has decreased during the covid- pandemic, as opinions about whether home-and community-based physical therapy (pt) should remain open are mixed [ ] . because of stay-at-home mandates, patients must choose to take their pt home or risk exposure going to one of the scarce pt services that are open amidst the pandemic. by continuing their pt at home, patients reduce the risk of hospitalization or other forms of care-both essential public health goals during a viral pandemic that is currently overwhelming hospital and nursing home capacity [ ] . it is necessary to increase remote access to care while preserving scarce resources, including personal protective equipment [ ] . without integrating remote rehabilitation options, such as vr, telehealth services, and digital practices, practitioners may disproportionately harm the most vulnerable patients, send a troubling message to the general public about the value of physical therapists, or worsen the potential short-term and long-term mental health consequences related to this global emergency [ , ] . we address the problem of how configuration contributes to user performance and range of motion in the vr environment. our studies were inspired by several previous experiments that investigated the significance of body position while using vr. there are several vr systems based on different user body configurations: seated [ ] , leaning while seated [ ] , standing [ ] , leaning while standing [ ] , walking in place [ ] , etc. kruijff et al. studied how leaning configurations affect vr performance [ ] . they tested both static leaning (keeping a tilted posture throughout the whole trial) and dynamic leaning (their upper-body inclination changes dynamically throughout the trial) with the leaning angles of forward, upright, and backwards. while the dynamic leaning data did not produce substantial results, the static leaning showed that leaning does improve accuracy, tct, and range of movement. their conclusions were based off the positive effect that leaning while seated had on self-motion perception. self-motion perception is how users use sensory cues to be immersed in their vr environment; it increases task performance because users feel part of the virtual environment and perceive cues that anchor them to the real world [ , ] . range of motion while using vr is important to study because if we can find the best configuration for users to move freely in, their user performance and comfortability while completing the task will improve. also, being able to use your body to navigate within the virtual environment allows your hands to be free to complete tasks. for instance, using d devices (joystick, keyboard, etc.) to move around is not practical. in most applications of vr, ground navigation is not the primary action the user has to perform, so the system should keep users' hands available to use for tasks other than ground navigation [ ] . by moving in the virtual environment using other body parts, the user's hands, eyes, and local head orientation are completely free and available for other physical or social interactions. lazynav looked at how moving both the upper and lower extremities affected user performance [ ] . participants tested combinations of these motions while standing: bend bust, lean bust, rotate shoulders, rotate hips, bend hips, bend knees, take a step. one-way anova tests showed that there were significant differences of all the motion pairs when they measured their movement distance, tct, and accuracy. with their quantitative and qualitative data, they were able to suggest which general body motions were easier to perform and more comfortable for their "lazy" vr design. this shows how a user's range of motion affects how they perform in their task and how easy/comfortable they perceive the game. when we evaluate how seated versus standing configurations, we will measure both user performance and user range of motion with quantitative (movement distance, tct, accuracy, exit survey data) and qualitative data (user suggestions). from gathering data on what configuration allows for the best range of motion during a vr game, we can propose the best configuration a patient should be in for vr physical therapy. one of the biggest advantages of using vr for pt is that it is portable; patients can take their therapy home to do it frequently at their convenience [ ] . a virtual reality therapy home-based system (vrt-home) was developed for children with hemiplegic cerebral palsy (cp) to practice hemiplegic hand and arm movements; children with cp have a brain injury in the motor cortex that impairs the opposite ue [ ] . their results showed that the system successfully targeted hand/arm movements of the hemiplegic ue, especially reaching activities that involve the shoulder and elbow. additionally, the child participants reported "[having] lots of fun" and "would like to take the games practice therapy activities home to play". patients enjoy using vr in their pt for their upper limbs; it is effective, enjoyable, and portable. however, this study only used seated configurations for their participants. we look at seated and standing configurations for a therapy game that works primarily on ue rehabilitation. we also consider how the therapy game will be received as a remote therapy tool during the covid- pandemic and "shelter at home". the secret garden is a -minute self-help vr protocol made to reduce the burden of the coronavirus [ ] . vr is an effective tool for the prevention and treatment of stress-related psychopathological symptoms and ptsd, with therapeutic benefits [ , ] . in this simulation, each user has a partner to discuss their emotions/reflections and they perform different tasks related to personal identity and interpersonal relationships. it provides the sense of community that was taken away due to isolation and quarantine and provides an outlet to manage one's stress. its biggest takeaways were the flexible use, high level of autonomy, and lower costs. gao et al. explores how physical activity vr programs reduce stress and promote health and wellbeing in older adults [ ] . our creative drawing game combines ue movements with a creative release to enhance user experience and health. the goal of this study is to compare two configurations of seated versus standing vr in body movements, userfriendliness, comfort and immersion during a creative drawing vr therapy game. our creative drawing game was coded in c# and through unity game engine. our game is compatible with several vr head-mounted displays (hmds), but we have chosen windows mr for our study as a more portable candidate in comparison to other consumer-level hmds. another motivation for choosing windows mr is its easy setup which will be extremely important in the future for tele-rehabilitation use of the game at homes by ordinary, first-time vr users. in our animal drawing game, the user can choose whether they want to participate in the easy level or hard level. the goal is to connect the dots of an outline drawing of either a fish (easy level) or a chicken (hard level) with a virtual paint brush. the background is a simple, serene mountain scape with a blue sky and clouds, allowing the user to focus on their task in a relaxing, distraction-free environment. when each dot is hit, it turns green from red, and a positive audio feedback sound is played to the user. when all the dots are green, meaning the user successfully connected all the dots of the drawing, they are celebrated by visually exciting animations. a walkthrough of each of the levels, based on their completion state is provided in figure . the game performance is flexible; the user can switch controllers to draw with either their left or right hand. the controller that is not drawing can be used to adjust the d dots model to the height or position the user feels most comfortable and allows this game to be played both seated and standing. no matter where they adjust the task to be, they are still reaching and moving their body to complete their drawing. we want to compare how much the range users can reach and how accurate their movements are while seated and while standing. using qualtrics, we developed pre-and post-questionnaires. in the pre-questionnaire, all the participants were asked about their demographics, prior vr and video game experience, level of education, past injuries, and fitness level. after the participants completed their tasks, they were given their individualized link to the postquestionnaire based on what testing group they were in. we chose to use a five-point likert scale for our questions because it is one of the most fundamental and frequently used psychometric tools for research [ ] . they were all asked to rate their discomfort level, movement restrictions, and ease of completing the task using the likert scale. we then asked a variety of questions about their ux. some of our post-questionnaire questions were derived from a validated and unified ux questionnaire for immersive virtual environments [ ] . the pre-questionnaire allowed us to collect demographics and background data on our participant pool. we asked about demographics: age, gender, height, weight, ethnicity, and education level. this data helps us understand how diverse and representative the group we are collecting data from is. participants were also questioned for vr usability: how often do you play video games; how much do you enjoy playing video games; have you ever used virtual reality headsets before. for those who have experience with vr, we then asked: what is the reason you used vr; did you enjoy using vr. this information indicates how receptive users will be to use a vr game for therapeutic activities. we also asked questions about user fitness: how frequently do you exercise a minimum of minutes per session; have you had a severe upper body injury, either due to sports or other accidents. data about users' fitness tells us how well users' performances will be in our therapy game that targets the upper body. users who have had previous rehabilitation experience would be able to provide more insight on how our therapy game compares in its effectiveness and entertainment. the post-questionnaire provides insight on vr usability/ux. users were asked how strongly they agree to the following statements: it was easy to compete the virtual drawing task; i felt comfortable while completing the task; i felt my movement was restricted while completing the task; using the vr drawing activity, i did stretch my body more than i normally do; i enjoyed playing the creative drawing game. this data gives us a better sense of how well our game will be perceived as a creative therapy game for users with little vr experience. we also asked about presence and cybersickness to ensure the users were not distracted by external stimuli while completing the task in the vr environment: the sense of moving around inside the virtual environment was compelling; my interactions with the virtual environment seemed natural; i was completely captivated by the virtual environment; i still paid attention to the real environment; i suffered from fatigue, headaches, nausea, or eyestrain during my interaction with the virtual environment. learning that the user was immersed in the vr environment and was not distracted by external factors helps us eliminate that confounding variables contributed to their tct and number of mistakes while playing. we can also assess how engaged users were with their therapeutic activities. finally, we asked users how likely they were to recommend this creative therapy game to friends of family members as a therapeutic exercise, particularly amidst covid- . we also collected text entries from participants, asking for thoughts on how to improve this activity for future use and their preferences for being seated vs. standing. a unique id was generated by each participant and was repeatedly used in completion of the questionnaires to assist us keep track of their pre-, post, and main intervention data while preserving their anonymity. design we chose to use a x counter-balanced mixed-design to test seated vr vs. standing vr and easy level vs. hard level. we used a within-subject design for the seated vs standing configuration because it allows us to see how the same person responds differently given the different conditions of seated vs. standing, and used a between-subject design for the level complexities because it reduces confounding variables due to exposure of multiple treatments. the four testing groups following the mixed design that participants were randomly assigned to were as follows: users test the easy level seated first then standing; users test the easy level standing first then seated; users test the hard level seated first then standing; users test the hard level standing first then seated. procedure the study was approved by the institutional review board (protocol #: xxxxxx). after consent, participants were randomly assigned to one of the four study conditions. we carefully instructed each participant what to do and after they expressed that they fully understood, they were guided to a chair or the middle of the room to stand depending on their conditions. we then gave them the headset to put on and their hand controllers and were started on either the easy or hard level using their dominant hands (figure ). we manually recorded how many times they made a mistake in their continuous drawing stroke (when a dot doesn't turn green because they missed it) and their task completion time. they were then asked to complete the post-questionnaire for the configuration they just tested and repeated the task in the other configuration. after all their trials were completed, each participant filled out the exit survey about their experience using the game. we used stata software and rstudio to perform the setst and visualize the findings. we performed one-way anova tests to find a significant difference between the easy and hard levels based on our dependent variables of task completion time (tct) and number of mistakes. we also performed tests to find out any significant differences between the seated and standing configurations using the tct, number of mistakes, and postquestionnaire data. we also used a python program to collect data on the hand controller position, which we visualized it using rstudio. the d visualization allows us to visually assess how accurate was a drawing compared to the provided outline. it also visualizes the range of their movements from how far they could reach to draw their drawings so that we can see what areas they struggled to reach. for example, in figure d , the visualization shows that the user was shakiest in the upper left corner of the fish and overlapping lines in the upper left corner of the chicken; we can speculate that the user's range of motion is not as strong on the upper left side. the quantitative and qualitative data we collected from the questionnaires help us determine if their performance was affected by any confounding variables, such as distractions or cybersickness, and helps us understand their impressions about the game. the results from participants ( females) took part in the pilot study is reported here. a significant difference between the game's level of complexity for easy level (fish drawing) and the hard level (chicken drawing) with task completion time (f( , )= . , p< . ), and number of mistakes was observed (f( , )= . , p< . ). overall, the chicken model was more challenging, and harder to complete, but yet entertaining; users also spent more time to complete it, and have more mistakes while performing the task. no significant difference was observed on study configurations of seated and standing based on tct nor mistakes in any of the easy or hard levels of the game. figures and presents a summary of these results in four smaller conditions, and also for larger levels of the game based on tct and mistakes. "i prefer standing. it was a lot easier to move. when i was sitting i had to reach more and wanted to lift myself off the chair a bit to get to the highest parts of the chicken. but since i couldn't i had to really stretch my arm and controller out to hit the dots." in summary, seated configuration was reported to be more comfortable, it was recognized to facilitate more stretching/reaching because they could not move lower body towards dots, and it had more accommodates to medical conditions. besides, standing configuration was preferred by some of the participants because it was easier to reach while standing, especially for short participants. another reason for seated is that some participants felt they had to be more weary of hitting things in the real world while standing because they were not grounded on a chair. to evaluate the functionality of our portable creative therapy game, we tested our game in the apartment of a lab member, following covid- guidelines. this setting strengthens our evidence that our therapy game is compatible with remote therapy and the shelter-at-home mandate. with our well-defined study design, we managed to conduct a relatively comprehensive data collection. however, similar to any study, there was some limitations to our pilot study. this was our preliminary study to objectively evaluate our vr therapy game and thus, we tested it with convenience sampling and our pool of participants was young, healthy college students. we anticipate testing the creative therapy game with actual patients in need of upper extremity therapy in the future. this project contributes not only to upper limb rehabilitation, but to therapy of all disciplines. if we can effectively integrate vr into pt, patients will be more comfortable and more engaged in their recovery. intensive, repetitive, task-oriented effective recovery tasks that are proven to be the most effective form of pt can be in the form of an entertaining, immersive vr game, from creative drawing for the upper body to versions of soccer to work on footwork [ , , ] . our project provides evidence that we can make a simple, familiar, portable game that is also helpful, encouraging, and distraction-free for remote therapy. due to "stay at home" orders and need to socially distance in response to covid- , portable physical therapy is necessary. the demand for remote rehabilitation is high, as meeting with a physical therapist would increase the chance of infection and hospitalization [ ] . not to mention, those not suffering from ue mobility inhibitions are now feeling an increase of depression, stress, and anxiety from the global emergency [ ] . our proposed therapy game allows users to do their therapy at the comfort of their homes, helping both physical and emotional health of the individuals. the immediate next steps of this project are to conduct studies with larger-scaled, and representative participant pools to validate our findings. because this game would be targeted to ue patients, we would need participants with upper limb impairments to collect data from. from a technical point of view, the therapy game and data-collection software need to be more user-friendly for patients with no technical background to easily use at home. it would benefit the patient if they could receive real time feedback on their performance and compare it to previous performances on one easy-to-use interface. also, using a creative physical therapy game requires the user to play the levels multiple times a day for the therapy to be truly effective [ ] . therefore, we need to add more levels so that the game stays challenging, engaging and entertaining for the user. more levels would also allow us to address the needs of multiple patients and market to not just stroke, parkinson's disease or cp patients, but to all patients with upper extremity impairments. in this article, a creative virtual therapy game was introduced and tested with a preliminary participant pool of students. the results from participants' movement accuracy, task completion time and usability questionnaires indicate that participants had significant performance differences on two levels of the game based on its difficulty (between-subjects factor), but no difference was observed in seated and standing configurations (withinsubjects factor). it means that both of these configurations can be used interchangeably, for instance in future clinical applications with some further considerations; without introducing the risk of lowered performance or accuracy due to study configuration. also hard mode was more popular among participants. these findings suggest great potential for its future applications in remote physical therapy for upper-extremity mobility during the covid- shelter-at-home reality. we would like to acknowledge and thank all members in of the research team at the affiliated university. we also wish to express our gratitude to the participants who kindly participated in our study. we wish to acknowledge our gratitude for the 'sponsor' program for sponsoring this project, with a grant from the 'grantsponsor'. any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsors. haptic retargeting: dynamic repurposing of passive haptics for enhanced virtual reality experiences training intensity affects motor rehabilitation efficacy following unilateral ischemic insult of the sensorimotor cortex in c bl/ virtual reality exposure-based therapy for the treatment of post-traumatic stress disorder: a review of its efficacy, the adequacy of the treatment protocol, and its acceptability d user interfaces: theory and practice influence of virtual reality soccer game on walking performance in robotic assisted gait training for children experimental methods: between-subject and within-subject design exploratory findings with virtual reality for phantom limb pain; from stump motion to agency and analgesia the essential role of home-and community-based physical therapists during the covid- pandemic virtual reality exercise as a coping strategy for health and wellness promotion in older adults during the covid- pandemic effects of virtual reality-based exercise imagery on pain in healthy individuals self-administered, home-based smart (sensorimotor active rehabilitation training) arm training: a single-case report global storm of stress-related psychopathological symptoms: a brief overview on the usefulness of virtual reality in facing the mental health impact of covid- likert scale: explored and explained upper body leaning can affect forward self-motion perception in virtual environments covid- and the advancement of digital physical therapist practice and telehealth virtual reality and pain management: current trends and future directions the development of a home-based virtual reality therapy system to promote upper extremity movement for children with hemiplegic cerebral palsy virtual reality in psychiatric disorders: a systematic review of reviews extended lazynav: virtual d ground navigation for large displays and head-mounted displays taking steps: the influence of a walking technique on presence in virtual reality surgical navigation inside a body. us a proposition and validation of a questionnaire to measure the user experience in immersive virtual environments unfulfilled rehabilitation needs and dissatisfaction with care months after a stroke: an explorative observational study experiences of upper limb somatosensory retraining in persons with stroke: an interpretative phenomenological analysis results and guidelines from a repeated-measures design experiment comparing standing and seated full-body gesture-based immersive virtual reality exergames: within-subjects evaluation standing in vr: towards a systematic classification of challenges and (dis)advantages remain seated: towards fully-immersive desktop vr key: cord- -sp o h authors: raskar, ramesh; nadeau, greg; werner, john; barbar, rachel; mehra, ashley; harp, gabriel; leopoldseder, markus; wilson, bryan; flakoll, derrick; vepakomma, praneeth; pahwa, deepti; beaudry, robson; flores, emelin; popielarz, maciej; bhatia, akanksha; nuzzo, andrea; gee, matt; summet, jay; surati, rajeev; khastgir, bikram; benedetti, francesco maria; vilcans, kristen; leis, sienna; louisy, khahlil title: covid- contact-tracing mobile apps: evaluation and assessment for decision makers date: - - journal: nan doi: nan sha: doc_id: cord_uid: sp o h a number of groups, from governments to non-profits, have quickly acted to innovate the contact-tracing process: they are designing, building, and launching contact-tracing apps in response to the covid- crisis. a diverse range of approaches exist, creating challenging choices for officials looking to implement contact-tracing technology in their community and raising concerns about these choices among citizens asked to participate in contact tracing. we are frequently asked how to evaluate and differentiate between the options for contact-tracing applications. here, we share the questions we ask about app features and plans when reviewing the many contact-tracing apps appearing on the global stage. more than , deaths are now attributed to the global covid- pandemic. many thousands more lives are expected to be lost before we have brought the disease under control and are capable of managing future spikes in the number of cases. in an effort to both slow and stop the disease, communities across the world have halted everyday life, requesting or requiring their residents to close non-essential businesses, stop going to school, and stay home. digital initiatives hope to support safe and wellconsidered approaches to the reopening of our societies while simultaneously reducing the human loss of life by giving frontline officials modern tools with which to control this pandemic. one particular set of modern digital tools aims to upgrade contact-tracing capacity, typically a lengthy and laborious process. in addition to increasing the speed with which contact-tracers can reach those who have been exposed to the disease, these tools can increase the accuracy of contact tracing. however, many first-generation digital contact-tracing tools have paved the way for a post-pandemic surveillance state and the mistreatment of private, personal information. privacy must remain at the forefront of the global response, lest short-term pandemic interventions enable long-term surveillance and abuse. the design and development of the next generation of contact-tracing tools offers an opportunity to sharply pivot to solutions using privacy-first principles and collaborative, open-source designs. these tools present an opportunity to save lives by flattening the curve of the pandemic and to provide economic relief without allowing privacy infringements now or in the future. covid- virus transmission occurs for several days before a person shows any symptoms. during this time, a person going about their daily life may interact with, and possibly pass the infection to, as many as a thousand people. without knowing they are infected, an individual who has only mild symptoms or is asymptomatic may continue to interact with others, further spreading the virus. this creates an exponential rise in infections. stopping the spread of covid- with pharmaceutical treatments and vaccines remains at least - months away from widespread availability. therefore, public health countermeasures, such as social distancing, offer the only possibility of stopping virus proliferation in the near future. when applied broadly, such measures disrupt every aspect of society and risk economic collapse. already, unemployment rates have skyrocketed, tenants are struggling to pay rent, and critical supply chains, including the food supply chain, have been interrupted. the longer strict social distancing measures remain in place, the more severe the consequences for economies and societies will be. however, if social distancing measures are lifted too quickly, the virus will spread once again, claiming many additional lives. the contact-tracing process evaluates the recent location history and social connections of those who become infected and notifies the people they have interacted with of their exposure to the virus. in this way, contacttracing methods allow targeted measures (e.g., quarantining, virus testing) to be applied only to exposed individuals. traditionally, public health officials perform contact tracing manually, by interviewing patients diagnosed with a disease about their activity over the past days or weeks. then, officials reach out to people who crossed paths with the patient during the time the patient was contagious and recommend targeted interventions to prevent further spread of the disease. widespread, rapid transmission of a virus by respiratory droplets, as in the case of covid- , challenges the practicality of the traditional contact-tracing process. manual tracing is resource intensive, is time consuming, and will, at best, be limited to contacts within the social circles of the infected-and thus cannot trace strangers effectively. furthermore, the patient being interviewed is often extremely ill and at risk for memory errors during the interview. digital contact-tracing tools may help mitigate these challenges. today, almost half of the world's population carries a device, such as a smartphone, capable of gps tracking and bluetooth communication with nearby devices. each device is able to create a location trail-a timestamped log of the locations of an individual, as well as a list of anonymous id tokens that are collected when the device user crosses near another device. by comparing the device users' location trails or the anonymous id tokens they have collected with those from people who have covid- , one can identify others who have been near the person who is infected; this facilitates contact tracing in a more accurate and timely manner than the traditional manual approach. several pilot programs, particularly in china and south korea, have demonstrated the technical feasibility of contact-tracing applications as tools to help contain the covid- outbreak within a large population. however, these programs also highlight the very real risks that exist with the use of such technologies. a location trail and list of nearby device ids contains highly sensitive, private information about a person: everything from where they live and work and which businesses they support, to which friends and family members they visit. location data can be used to identify people who are infected and might then be targeted by their community. for example, data sent out by the south korean government to inform residents about the movements of persons recently diagnosed with covid- sparked speculations about the individuals' personal lives, from rumors of plastic surgery to infidelity and prostitution. more frightening still, enabling access to a person's location data by a third party, particularly a government, opens a path to potentially unrestrained state surveillance. in china, users suspect that an app developed to help citizens identify symptoms and their risk of carrying a pathogen was used to spy on them and share personal data with the police. care must be taken in the design of such apps. a number of groups, from governments to non-profits, have quickly acted to innovate the contact-tracing process: they are designing, building, and launching contact-tracing apps in response to the covid- crisis. a diverse range of approaches exist, creating challenging choices for officials looking to implement contact-tracing technology in their community and raising concerns about these choices among citizens asked to participate in contact tracing. we are frequently asked how to evaluate and differentiate between the options for contact-tracing applications. here, we share the questions we ask about app features and plans when reviewing the many contact-tracing apps appearing on the global stage. we are asking an open-source approach lets programmers and other experts outside the app development team review the code for a project. these outside programmers can make improvements, copy the code, or use it to create something entirely new. open source offers a layer of trustworthiness. because the code is publicly available, it can be reviewed by experts around the world to confirm it works the way the development team says it should. there are, at times, valid reasons to not use an open-source approach, such as when a business is seeking to develop a proprietary technology. during the covid- crisis, we believe that open-source projects promote collaboration and foster community. contact-tracing apps require the use of a data source to infer contact between two people: two of the most useful data sources are gps location data and bluetooth broadcasting. gps-based apps create a "location trail" for each user by recording their time-stamped gps location. if a person catches covid- , they can share their location trail with the responsible authority-the health worker, public health official, government official, or app creator. the authority then releases some or all of the location trail for other users to compare to. in some applications, the person who is infected might be able to directly share their location trail with other users. other apps rely on bluetooth to determine who the person who is infected has crossed paths. such apps create a unique identifier, a number or token, which the app broadcasts to nearby devices. the user's phone then records the identifiers of other phones it has been near. if a person becomes infected, their unique identifiers can be compared to those stored by other users to determine who the infected person has crossed paths with. in some cases, such as the singapore tracetogether app, the central authority stores user information and can determine the user's phone number and identity from an identifier. in others, such as covid watch and coepi, the identifiers provided by the person who declares themselves to be infected cannot be used by the central authority to determine the person's real world identity. both approaches offer distinct advantages and challenges: gps-based approach • allows for estimation of exposure related to surface transmission of disease. unlike bluetooth, gps-based systems can notify users if they were in a location shortly after a person infected with covid- , when the chance for exposure to the virus through commonly touched surfaces is high. • enables users to import historical data. other applications on the users' phones, such as google maps, are already collecting the potential user's location histories before they install the contacttracing app. when users import these historical data, the app can alert the user to potential exposures from their location history, even before they downloaded the app. • provides redacted, anonymized gps data to help public health officials follow the spread of disease within a community. • is able to record the user's location history using a small amount of data, making scaling and implementation in regions with high data costs more likely. bluetooth-based approach • uses signal strength, which is reduced by walls and other barriers, to estimate the distance between users. in some places, such as a large, multi-floor building, this estimate more accurately reflects the chance of exposure to disease than a gps-based approach. • uses time-range-dependent, randomly generated numbers as ids to ideally achieve relative anonymity. • requires the use of a compatible app by other users to record possible exposures. if an app is not widely adopted, the potential utility is limited. • no potential to collect historical data from before the user downloaded the app. in the near-future, some solutions, including covid safe paths, will integrate both approaches, allowing the user to harness the advantages of each while mitigating some challenges. both gps and bluetooth: aarogya setu (india) bluetooth: trace together (singapore) some bluetooth-based apps use a fixed identifier, meaning the unique number assigned to the device does not change and is permanently associated with the user. time-variable identifiers change on a set time interval, such as an hour, so each user is associated with many different identifiers. the use of time-variable identifiers adds a layer of privacy protection by making it difficult for a third party to track a particular phone over time based upon a single identifier. in a centralized version of contact tracing, location and contact data are collected and consolidated centrally by a single authority, often a government entity. china utilized a centralized approach with its app. other information about the user, such as mobile telecommunication service provider or payment data, may be collected and paired with the location data. the central authority identifies people who are infected, determines their contacts, and requests specific actions by those who may have been exposed to the virus. centralized systems create powerful tools for analysis and public health decision making. however, such systems also expose a person's data to a central authority, creating an opportunity to undermine the person's privacy. in a decentralized approach, the healthy user's data never goes to a central server. location data are stored and processed on the phone of the user. only the location data of people confirmed to be infected need to be shared. tools, such as redaction and blurring of the infected person's data, can be used to help preserve their privacy. an israeli app, track virus, is an example of a decentralized approach, as is covid safe paths. decentralized systems typically offer greater privacy protection and are, therefore, more in line with privacy requirements and regulations such as gdpr. some utility may be lost compared to centralized systems as collection and aggregation of large data sets from users can be used for beneficial public health research. however, as we consider the various approaches, the grave privacy risks associated with centralized systems far outweigh the limited additional benefits, leading us to highly value decentralized approaches. when checking if a healthy user has been exposed to covid- , contact-tracing apps may either push the healthy user's data to the authority (centralized processing) or pull a list of locations and/or contact ids of those who have been infected from the authority (decentralized processing). with a push, the healthy user's data is pushed (shared) off of the user's device and is compared by the authority to the data of people who have been infected. this exposes a large amount of data to the authority. in a pull model, an anonymized history of location data or identifiers from people who have been infected are pulled onto the healthy user's device so that the comparison can take place locally without compromising the privacy of healthy individuals. given what is known to date about person-to-person transmission of covid- , contact-tracing apps can properly assess users' potential exposure to the virus if they take four important factors into consideration: • the distance between the person who is infected and the user. • the length of time the person who is infected and the user occupied the same space. • how many days prior to becoming infected the person interacted with the user. • whether or not the user may have had contact with contaminated surfaces after interaction with the person who is infected. a location history must be collected from a person who has been diagnosed with covid- in order for contact tracing to occur. several approaches are being piloted. in general, these approaches fall into two categories: • an authority (public health official, healthcare provider, government official) collects the location history from the person who is infected and makes it available to users of the app. • the patient self-reports symptoms and directly shares their data with other users of the app. use of an authority offers the advantage of confirmation that the person has covid- . the overlap of symptoms between covid- and other common respiratory illnesses might cause someone to suspect they have covid- when they actually have the flu or a common cold. systems where people self-report themselves as infected pose the risk that people with symptoms, but without a confirmed diagnosis, share their location trail. self-reporting approaches are also at risk from bad actors who may misreport their status as infected in order to create chaos and fear. however, self-reporting systems have the advantage of fuller consent of the infected person as the person definitively decides to share their location trail without influence from an authority figure. when evaluating contact-tracing solutions, we seek to understand how data will be collected from the person who is infected and how the solution will confirm that the person truly has covid- . at the base of every contact-tracing app lies an algorithm that determines whether the app user has been exposed to people who are infected and might have an increased chance of being infected themselves. the algorithm integrates many factors, such as the distance between the users, the length of time the users were in the same location, or the amount of time between the contact and the start of symptoms. two apps with different algorithms will potentially give a different likelihood of exposure to the same user. understanding the algorithm used is necessary for public health officials and healthcare providers to provide appropriate guidance to users who receive an exposure notification. contact-tracing app developers must clearly communicate their algorithm with all stakeholders and failure to do so will be a significant red flag. location data may potentially be repurposed to achieve additional objectives beyond contact tracing. we believe these data should be used only for response to an ongoing pandemic and that other uses should be strictly forbidden. turning app data over to law enforcement or other non-health actors, such as commercial entities seeking to target ads to potential customers, threatens users' rights and privacy. critically, this undermines public trust. without trust, citizens will not adopt contact-tracing apps at a wide enough scale to effectively control the spread of the epidemic. therefore, access to location-tracking data should be tightly limited to specific public health initiatives working on pandemic response. users should be able to confirm how their data is used. promises by the app's developers to delete data are insufficient. users should be able to check exactly what location data has been collected and stored and to confirm that their data is no longer there after the deadline for deletion (the disease's incubation period, to days for coronavirus). apps must obtain users' unforced and informed consent for any disclosure of their data. recently, the a teleom austria group shared aggregated user location data from an app not regularly used for public health purposes with the austrian government's covid- emergency management team for reasons that were not initially specified. observers believe that a 's data was most likely being used to forecast disease spread or to monitor the population for large gatherings that might transmit the virus. however, the sharing of location data with government agencies for unspecified purposes attracted the criticism of privacy rights activists and created suspicions that weakened user trust, threatening long-term success. an opportunity for misuse and privacy violations arises whenever a third party, a government, a corporation, or any other entity is able to access the data of healthy users. a decentralized approach prevents privacy compromise for healthy users because they are doing all the calculations on their own phones. time-limited storage of location data also protects user privacy, such as only storing days of data with deletion of everything beyond this point. all contact-tracing app development teams should clearly articulate how they protect the privacy of all users -whether healthy or infected. as an example, a preliminary draft of the privacy principles of the covid safe paths team can be accessed in covid- contact tracing privacy principles. this overview of model privacy practices explains how the application embraces principles such as privacy by design, the fair information practice principles (fipps), and legal protection by design. historical location data and nearby device ids must be collected from a person who is infected to enable contact tracing. however, both the collection and release of that information have broad implications for the privacy rights of the individual. as the most vulnerable stakeholder, several efforts must be undertaken to protect, to the highest degree possible, the privacy of the person who is infected. app development teams may design for privacy by utilizing a variety of approaches: • providing users with the ability to correct incorrect information. • notifying individuals about what data is collected, how long it is stored, and who will have access to it during each stage of use. • enabling people to obtain access to information about potential exposures to covid- without requiring that they consent to share their data with other parties. • deleting user location data after it is no longer necessary to perform contact tracing. • alignment with the fair information practice principles. • using open-source software to foster trust in the app's privacy protection claims. • limiting the amount of data published publicly. • providing tools that allow the person who has been diagnosed and their healthcare providers to redact any sensitive locations, such as a home or workplace. • end-to-end encryption of location data before sensitive locations are redacted. • eliminating the risk of third-party access to information by enabling voluntary selfreporting by the person who is infected. • supporting strict regulation around access to and usage of the data by any entity that collects it, particularly governments. • obtaining targeted, affirmative, informed consent for each use of the person's data. • providing users with the ability to see how their data is being used and revoke consent for usage of their information. requiring people who are infected or potentially infected to track their movements and disclose their contacts achieves the highest degree of efficacy in contact tracing within a community. however, if residents cannot choose to at least selectively withhold their information, they may be stigmatized, persecuted, or exploited by malicious actors on the basis of their data. voluntary reporting respects users' rights to privacy and to informed consent. it encourages app developers to include safeguards that reduce the risk for abuse of sensitive data. however, when individuals who become infected refuse to share their contact-tracing data, the accuracy of contact tracing declines, potentially contributing to misinformation and a false sense of security. we believe that no one should be forced to relinquish highly sensitive personal data. we dislike solutions that require potential users to consent to share their data if they become infected in order to access information about whether or not they have crossed paths with someone who was infected. incentives such as those outlined in the following sections should be implemented to encourage users who become infected to share their data. people who are healthy should also proactively choose to use a contact-tracing app rather than being mandated to do so. potential users should be encouraged to do so by incentives, such as the opportunity to take control of their information to benefit their health, strong privacy protection policies, trust in the app's developers, clear communication, and informed consent. in order to roll out a contact-tracing app on a global scale, three groups must work together: a substantial team to create and promote the app; large, trusted institutions to support development and deployment of the app; and local, onthe-ground partners in the various communities in which the app is deployed. contact-tracing apps are tools, not complete solutions. disease containment utilizing these tools requires multidisciplinary collaborations across the technology, healthcare, public health, and government sectors. we are working hard to create these partnerships for covid safe paths and look for such partnerships in other apps we evaluate. among those partnerships teams should be seeking to build are: • cloud players (aws, azure, gcp, etc.) • mobile carriers and local telecommunications providers. • partnerships with health authorities; these partnerships are particularly important in light of app store requirements for all apps addressing the covid- pandemic to have the support of a health organization • government agencies • local public health workers and healthcare providers: contact-tracing apps will only succeed if those who crossed paths with someone who became infected can receive guidance and support from local providers on what steps to take to protect themselves and their families. • current contact tracers; integrating into the current contact-tracing protocol increases the effectiveness of a contact-tracing app within a community • non-profit organizations and academic institutions we see apps aiming to deploy at a variety of levels, from a single city to an entire nation to those aiming for a global reach. regardless of the level at which they are deployed, contact-tracing apps must be paired with existing infrastructure in order to support a successful containment strategy. public health officials and healthcare providers must be ready to answer user questions, offer testing, or provide advice about what to do if someone has been exposed to a person with covid- . the resources and support necessary to follow this advice must also be made available. we look for well-considered deployment strategies with aggressive outreach to local partners. for this reason, we are building not only a contact-tracing app, but also safe places, a web-based tool for public health officials working to contain the covid- pandemic. it is also worth noting that as global travel resumes, cross-communication between apps operating in different regions will be necessary to achieve global containment of covid- . we look for teams that are thinking ahead and building the technological foundation for this collaboration into their application. taking any software tool from idea to widespread solution requires the team to think creatively. contact-tracing apps gain value with each additional user. many approaches to encouraging user adoption exist, and good teams will use a variety of them. a few steps we encourage are: • fostering trust • developing key partnerships, including with community officials who can help drive local support for the solution • creating solutions that meet the needs of public health officials responding to the pandemic • focusing on the needs of the users • providing value to the user during a contact-tracing interview even if they choose not to download the app before they have been diagnosed with covid- contact-tracing apps need a strong value proposition for each stakeholder-the healthy user, the person who is infected, the public health worker responsible for contact tracing, the public health authority responsible for the community's response to the pandemic, and government officials tasked with coordinating the local or national response to covid- . as an example, the incentives for each stakeholder from the safe paths solution are presented here. offers an opportunity to take control and gain information. the user is able to make decisions about where they should be going and what activities are safe for their families and themselves. users are more confident and more informed about their actual risk of spreading the disease. gives the ability to quickly and accurately share location history with public health contact tracers. sharing their history offers an opportunity to help protect their community. gives immediate relief to contact tracers. provides a tool to more efficiently conduct interviews and gather information from patients. increases data accuracy over current methods (e.g., remembering). enables them to work with infected patients to quickly remove information that the patient asserts is personal, private, and/or confidential. allows more efficient and more accurate data collection and analysis about the spread of covid- within their jurisdiction. provides data to make better, more targeted recommendations for intervention to their community and to utilize limited testing resources most constructively. offers an opportunity to communicate a personalized risk profile to each citizen, answering the question "should i be concerned or not?" for every individual in their constituency and to closely monitor those who have the highest chance of experiencing complications from covid- . faster and more accurate contact tracing allows officials to catch up with the virus and more effectively deploy resources. rather than undifferentiated application of lockdown measures risking economic and subsequent financial collapse, officials are able to implement a differentiated approach with targeted measures as recommended by the who. the utmost care must be taken when notifying users of a potential exposure to covid- given the serious health, economic, and social consequences of a notification. during this stressful time, clear, easy-to-understand communication reduces the possibility for the user to misjudge their situation. high-quality translations should be available for all users. transparency about how the decision to notify the user was made helps the user and their public health officials make decisions about whether and which containment measures the user needs to undertake. notifications should evolve to reflect advances in the understanding of disease transmission as scientists around the world continue to clarify how covid- passes from person to person. contact-tracing apps, particularly those that allow individuals to self-report themselves as infected, must address the risk that some people will make fraudulent reports. in some instances, a false report may be done in good faith-the person truly suspects they have covid- , but they have not undergone definitive testing and actually have a different virus. in other cases, bad actors may report themselves as infected with covid- in order to create chaos. storing sensitive information in an anonymized, redacted, and aggregated manner minimizes the risk of data-tampering, yet it does not eliminate the chance for human error or malicious intervention. one approach to reducing fraud requires the diagnosis to be confirmed by a healthcare provider. however, creative teams may find other ways to prevent false reports of illness. with large-scale deployment, most apps will experience an occasional false report or find an error in an otherwise correct report. each app should develop a protocol for its response when an incorrect report is identified. easy-to-use tools should allow all involved in reporting to quickly mark and remove errors as soon as the false report is identified. most often, users should be notified of the change in their exposure history. while most apps aim to obscure the identity of the person who is infected, accidental release of information sufficient to identify the person can occur on rare occasions, similar to accidental release of protected health information. these low risks should be communicated to the users during the consent process. a process for quickly removing identifiable information from public access should be in place. notification of a potential exposure to covid- will be frightening to many, particularly those at increased risk for serious complications, and may lead to panic among users. large groups of people seeking medical evaluation or demanding testing could quickly overwhelm an already strained healthcare system. we have seen panic related to the pandemic lead to hoarding and vigilantism. conversely, users who are not notified of a potential exposure may assume they are at no risk to catch covid- and disregard critical social distancing and hygiene recommendations. any contact-tracing solution will need to provide users with accurate information to reduce the chance for panic or risky behavior. when reviewing an app, we look for the following: • clear, easy-to-understand, culturally appropriate communication with the user • engagement of epidemiologists, public health officials, and healthcare providers, both as core members of the decision-making team and as local partners within the community to which the app is deployed, in order to provide assessment and recommendations to people who may have been exposed to covid- • measures to prevent individuals from falsely reporting themselves infected and thoughtful consideration of how a person reported to be infected is confirmed to have covid- • use of both gps and bluetooth systems, utilizing the strengths of each technology • creative algorithms that reduce the chance that insignificant exposures are flagged contact-tracing apps should be viewed as a tool to be utilized by experts in infectious disease control. epidemiologists, public health officials, and healthcare providers must be core members of any team designing and implementing a contact-tracing app. we look to see that such experts are included as team members, mentors, and strategic partners. ideally, contact-tracing apps should fit into the current care pathway. one of the leaders in this area is tracetogether in singapore, which supports a contact-tracing process put in place long before the app was ready. tracetogether uses bluetooth to identify nearby phones with the app installed and tracks both proximity and timestamps. if a person is diagnosed with covid- , they can choose to allow the ministry of health to access their tracetogether data, which is then used by the manual contact-tracing team to alert those who may have been exposed. the manual contact-tracing team then alerts those who may have been exposed. we also aim to lead in this area with the development of covid safe places, a web-tool allowing public health officials to work more quickly, collect better data, and better respond to what is happening in their community. we are partnering with public health workers around the world to deploy covid safe places. the success of any contact-tracing program should be measured in lives saved. lives are saved both by a reduction in the spread of disease and by a reduction in the psychosocial and economic consequences of widespread quarantine actions. quantitative analysis of the effect of this new technology should be undertaken-not only to allow for further improvements during the current covid- pandemic, but also to better address the next outbreak of infectious disease. in addition to collecting real-world data about the impact of contact-tracing apps, teams should work to communicate their success to the public. if the apps are effective in helping to control the pandemic, the public may fail to notice the extent to which their use was critical to the community's ability to control the spread of disease. the covid- pandemic will not last forever. if we falter in our response and choose digital contact-tracing tools that compromise individual privacy for efficacy, the consequences will extend long after the last store has reopened and the last child has returned to school. we believe privacy does not have to be compromised in order to reduce new infections and slow the spread of disease. we are building covid safe paths with privacy protection at the forefront for this pandemic and the next. here, we have begun to detail the key questions that should be asked as we evaluate contact-tracing apps developed and deployed against the covid- pandemic. we plan to continue this discussion and are committed to serving as a resource for countries, states, cities, and individuals throughout the world. we welcome additions to and modifications of this report and analysis. to submit a change please email info@pathcheck.org assessing disease exposure risk with location data: a proposal for cryptographic reservation of privacy how europe manages to keep a lid on coronavirus unemployment while it spikes in the u.s. the washington post privacy by design: the foundational principles. implementation and mapping of fair information practices covid- dashboard recommendation regarding the use of cloth face coverings singapore says it will make its contact tracing tech freely available to developers % can't pay the rent: 'it's only going to get worse fair information practice principles clever cryptography could protect privacy in covid- contact tracing apps covid- contact tracing privacy principles centre for the mathematical modelling of infectious diseases covid- working group coronavirus: the korean clusters. reuters graphics legal by design' or 'legal protection by design'? in law for computer scientists coronavirus disease vs. the flu the efficacy of contact tracing for the containment of the novel coronavirus (covid- ) more scary than coronavirus': south korea's health alerts expose private lives. the guardian the incubation period of coronavirus disease (covid- ) from publicly reported confirmed cases: estimation and application how the coronavirus is disrupting the global food supply in coronavirus fight, china gives citizens a color code, with red flags. the new york times coronavirus vaccine in months? experts urge reality check coronavirus has disrupted supply chains for nearly % of u austria: telco a gives government location data to test movement restrictions apps gone rogue: maintaining personal privacy in an epidemic. arxiv don't believe the covid- models: that's not what they're for people tattle on neighbors flouting covid- shutdown orders contact tracing covid- virtual press conference jobs carnage mounts: million file for unemployment in weeks. national public radio key: cord- -rj i v authors: wang, jiexiang; guo, bin; wang, xiaoyan; lou, shuzhen title: closed or open platform? the nature of platform and a qualitative comparative analysis of the performance effect of platform openness date: - - journal: electron commer res appl doi: . /j.elerap. . sha: doc_id: cord_uid: rj i v internet platform enterprises have become one of the dominant organizational forms for internet-based businesses. despite the strategically crucial role that openness decision plays for internet platform enterprises, the results of existing research on the relationship between platform openness and platform performance are not conclusive. as to the nature of platform, its transaction attribute has been overemphasized while its innovation attribute is mostly neglected. through decomposing platform openness into supply-side openness and demand-side openness, as well as introducing demand diversity and knowledge complexity as contextual variables, this study attempts to understand the impact of both types of attributes on performance by considering their configuration. using fuzzy sets qualitative comparative analysis (fsqca) method, we find that high demand diversity of platform users and high supply-side openness will lead to better platform performance. moreover, the high knowledge complexity required for platform innovation together with high supply-side and demand-side openness will contribute to a high level of platform performance. innovation attitudes should be taken into consideration in the research on the relationship between platform openness and performance. in light of the concerns mentioned above, this study intends to resolve the mixed findings by decomposing the underlying dimensions of platform openness and introducing two contextual variables. considering the bilateral structure of platforms (gawer and cusumano, ) , we decompose the platform openness into two dimensions, which are supply-side openness and demand-side openness. we argue that boundary conditions will affect the relationship between the two dimensions of platform openness and platform performance. from the transaction perspective, the performance effect of platform openness depends on platform users' demand diversity, because it can help eliminate the competitive crowding effect among supply-side users (hagiu, ) . from the innovation perspective, the performance effect of platform openness depends on knowledge complexity required for platform innovation. since diversified user base is required to provide heterogeneous information and resources relevant to the production and service innovation on the platform (baldwin and von hippel, ) . extant studies on platform openness are more concerned with supply-side openness rather than demand-side openness (parker and van alstyne, ) . this study contributes to the platform openness decision research by purposively integrating transaction and innovation attributes of platforms. there are inconsistent findings on whether more open platform is associated with better performance (casadesus and hałaburda, ) . this study reconciles the mixed findings in the literature by providing a conceptual approach of decomposing the platform openness into supply-side openness and demand-side openness. in addition, this study emphasizes the importance of taking platform characteristics into consideration. in fact, platform performance is a mix result of the interdependence of platform openness and these characteristic (misangyi et al., ) . to address the causal complexity issue, this study examines the configuration effects of openness dimensions, demand diversity and knowledge complexity on platform performance using fuzzy sets qualitative comparative analysis method (qca), a widely used method in configuration analysis (jenson et al, ) . we test our hypotheses using a data set collected from ipes in china. china is an ideal setting for our analysis since many ipes in china have achieved business success. chinese internet giants, such as alibaba and tencent, have become the leading ipes in the world. as such, chinese ipes provide a rich context to investigate the impact of platform openness strategy on their performance. in the field of e-commerce research, both e-commerce platforms and social media platforms have been investigated (chen and yu, ; zhang et al., ) . however, the prior studies have not paid enough attention to the influence of platform heterogeneity. it is worth noting that transaction and innovation attributes for these two kinds of platforms are quite different and thus should be managed by different strategies. our study emphasizes the necessity of taking platform attributes into consideration in building theoretical model. this study is organized as follows. theoretical background and research hypotheses are introduced in the next section. details as regard to the research method and the qca are explained in the third section. then we report the results. discussion, summary and theoretical implications are presented in the last two sections. the relationship between platform openness and platform performance is essentially an issue of firm governance. it is about how platforms handle their relationships with platform participants (tiwana, ) . because the enterprise is an entity of both transaction and innovation attributes (alchian and demsetz, ; simon, ; williamson, ) , platform openness decisions need to integrate both the transaction governance logic and the innovation governance logic (felin and zenger, ) . however, most of the existing studies stand on either position and follow only one governance logic. a handful of available studies on the nature of platforms mainly discuss the trade-off between product variety and competitive crowding effects (boudreau, ; huber, kude and dibbern, ) with an exclusive focus on the transaction attribute of platforms. existing research neglects the ambidextrous attributes of platforms, which may lead to inconsistent conclusions. to solve this problem, this study introduces two contextual variables, i.e., demand diversity and knowledge complexity, to capture the ambidextrous attributes of platforms. on the one hand, the platform openness governance based on the transaction attribute is related to managing product variety and improving matching efficiency (cennamo, ) . the contextual condition affecting such governance logic is the platform users' demand diversity (ghazawneh and henfridsson, ) . on the other hand, the platform openness governance based on the innovation attribute is associated with managing knowledge complexity and improving collaborative efficiency. in this case, the contextual condition becomes the knowledge complexity for product and service innovations on the platform (alexy et al., ; west, ). supply-side users mainly provide products or services required for the transaction in the platform architecture. when platform openness is high, the number of suppliers who can access the platform will increase. this increase in supplier size may lead to two opposite results as follows. for one thing, it may bring about an increase in the number and the variety of complementary products, attract more users to the platform and promote platform performance under indirect network effects (lin and daim, ) . for example, boudreau ( ) explain that the users' demand diversity is the reason why supply-side openness on the computer system platform, such as palm, microsoft windows ce, symbian, and linux platforms, keeps improving to introduce external developers. for another, as suppliers' size increases, the homogenization of products or services resulting from competitive imitation among suppliers may lead to the effect of competition and crowding out (casadesus and hałaburda, ) . excessive competition may also lead to adverse selection problems and the decline of the platform quality. eventually, demand-side users will flee away and the platform would fail in the end (boudreau, ). the relative explanatory power of the two above-mentioned mechanisms will be strongly influenced by the demand diversity of the users (hagiu, ) . different platforms have different types of users with various preferences. when the level of diversification of users' needs is high, a high degree of supply-side platform openness will lead to an increase in suppliers and then enhanced quantity, quality and variety of products or services. under this circumstance, the platform can better match the needs of users and attract more users to join in. moreover, higher demand diversity means that there are many niches for suppliers to cater to (freeman and hannan, ) and it is less possible to become a "winner takes all" market. thus, there are opportunities for many suppliers to flourish without competing directly. when a balance of supply and demand is reached, the platform performance will increase. in latest literature, researchers also paid attention on the influence of user preferences on platform strategies (panico and cennamo, ) . therefore, we propose the following hypothesis . the configuration between high demand diversity of platform users and high supply-side openness will lead to high platform performance. from the innovation attribute perspective of the platform, products and services innovations within the platform architecture are derived from the value co-creation activities among the platform, supply-side users and demand-side users (ceccagnoli et al., ) . higher platform openness can increase the total number of suppliers and users, resulting in an increased amount of resources and knowledge assets (inkpen and tsang, ) . especially, non-redundant information and resources will help improve platform innovation performance (barney and clark, ) . at the same time, an increase in the number of platform users may lead to coordination problems in knowledge sharing, delivery and integration (hansen, ; von hippel, ) . the relative explanatory power of the above two mechanisms will greatly depend on the knowledge complexity required for platform innovation. when the level of knowledge complexity required for platform innovation is high, knowledge diversity brought by the increase of users will be fully exploited (cassiman and valentini, ) and more opened platforms will obtain more returns on platform performance. in addition, platform product and service innovations require knowledge supply from both supply-side users and demand-side users. with the help of indirect network effect, a wider range of interactions between supply-side users and demand-side users at high levels of platform openness can build reputation and trust among users. the above interaction is essential to solve the coordination problem of complex knowledge innovation and facilitate tacit knowledge transfer for platform innovation (baldwin et al., ) . similarly, the study of randhawa et al. ( ) on online community platform "nexus" also emphasizes the significance of platform openness and participation of multilateral users. therefore, we propose the following hypothesis . the configuration of high knowledge complexity of platform innovation with high levels of supply-side and demand-side openness will lead to high platform performance. in summary, figure presents the theoretical model. the qca method can be further divided into crisp sets qualitative comparative analysis (csqca) and fuzzy sets qualitative comparative analysis (fsqca). the csqca shows that there are only two values of " or " for research variables, and " " indicates the complete membership of the collection while " " indicates the complete non-affiliation of the collection. the fsqca is an extension of csqca and the assignment of variables is in the " - " range, allowing the existence of "partial affiliation". for the purposes of this article, there is a partial affiliation in variable assignment. therefore, the method used in this study is fsqca. qualitative comparative analysis (qca) has been widely used in management research in recent years (fang et al., ; misangyi et al., ) . for one thing, the qca method embraces the causal complexity and turns to a configuration effect analysis of various factors on the results instead of analyzing single factor isolated effect. for another, qualitative comparative analysis can deal with large sample survey data and small sample case coding data (ragin, ; xie et al., ) . qualitative comparative analysis is suitable for this study for two reasons. first, the qualitative research method focuses on the complex mechanism of cause and effect (ragin, ) , which fits well with the idea of focusing on the ambidexterity attributes of the platform. it integrates and analyzes the influence of platform openness and contextual variables on platform performance. secondly, as a new research field, sample collection for platform research is quite limited and there is still a lack of research database. therefore, the qca method suitable for small samples is very useful in this study (xie et al., ). the qca method is based on an overall reflection of quantitative and qualitative data (schneider and wagemann, ) . for the measurements of variables involved in this article which are mainly collected from existing literature, we try to find the corresponding supporting materials in the platform context. specifically, using fsqca requires assigning values to all variables through coding based on sample information. for the assignment method in coding, there are multiple choices such as the three-valued assignment " , . , " and the four-valued assignment " . , . , . , . ". for the four-valued assignment, " . " represents no membership at all, " . " represents that the degree of non-subordination is greater than subordination, " . " represents that subordination degree is greater than non-subordination, and " . " represents full subordination. choice of different assignment method depends on the specific research situation (ragin, ) . through classification and sorting of sample data, the four-valued assignment can reflect the differences between sample data more finely in this study. the specific settings for assigning the corresponding variables are as follows. openness is the open threshold strategy adopted by the ipes which play leading roles in the platform-based ecosystem. the openness measure mainly refers to the depth and breadth of openness (cf. laursen and salter, ) . based on the research of boudreau ( ), this study defines platform openness as the open access and the open architecture. access openness is the entry threshold for users to platforms. open architecture is the degree of user participation in the innovation of the architecture after user entering the platform. specifically, supply-side openness refers to the openness of ipes to products or service suppliers, while demand-side openness is the openness of ipes to users. the higher the platform openness is, the higher the participation of users is. demand diversity. demand diversity of users is an important factor to be considered in the development strategy of platforms. the demand diversity of platform users depends on the functional positioning of the platform (hagiu, ) . a segmentation platform is limited to a certain business area and has a low degree of diversity, such as the home appliance development platform in haier company (which is china's largest home appliance manufacturer). on the contrary, an integrated platform has a broader business scope with higher user demand diversity, such as amazon and taobao. thus, this study uses the business scope positioning of platform as the alternative measure of demand diversity. this study also includes the differences in the demographics of the platform user (e.g., gender, age and educational level) in data coding. knowledge complexity. knowledge complexity is an important variable in the field of knowledge innovation and governance (nonaka, ) , which refers the density of interdependencies between functional components of knowledge activities and the degree of knowledge codification (kauffman, ; hansen, ) . we measure knowledge complexity in the following two ways: the degree of knowledge diversity is measured by finding whether platforms require knowledge from just one party or all of supply-side users, demand-side users and the platform; and the degree of knowledge expression is measured by distinguishing whether the platform is biased towards simple product transactions or complex knowledge innovations (cenamor et al., ) . platform performance. ipes are building blocks connecting supply-side and demand-side users to facilitate transaction and innovation activities (gawer, ) . therefore, the measure of platform performance should be based on both the value creation activities and its outcome to supply-side and demand-side users. following cennamo ( ), this study measures platform performance through their installed base the transaction volume and the ranking of ipes. table presents the measures of variables. insert table about here. as a qualitatively oriented research method, the qca method requires theoretical sampling rather than random sampling (ragin, ) . we take the following three criteria in selecting research sample of ipes. first, ipes should exhibit the features of the platform business model and be established for more than one year. we use two key features, i.e., two-sided architecture and network effect, to judge the platform business model (gawer, ) . sample bias can be controlled by requiring the platforms to have continuous operations for more than year (gawer, ). second, we consider the sample variations in platform ambidexterity to cover both transaction attribute and innovation attribute. according to the definition of jacobides ( ), transaction platforms refer to the information display platform or two-sided markets, such as amazon and taobao. innovation platforms are established for developing new products and services by collaboration between suppliers and users, such as ios platform and haier open innovation platform. third, in order to gather sufficient data, we select those platforms which can provide first-hand information through interviews (eisenhardt, ) . most of the samples are from yangtze river delta area in china where the authors live. although these platforms are concentrated in this area, regional factors will not have a big impact on sample selection because the operation of ipes is online virtual and not contrained by such limitations as time and space. then we obtain a sample of ipes as shown in table . the sample covers a number of industry sectors such as e-commerce, smart manufacturing, logistics and internet education. this sample size is similar to the general practice of recently published articles using the qca method in the field of management and business (greckhamer, ) . as suggested by ragin ( ) , the sample size is at least n , and n is the number of the antecedent. in this research, n equals , and the number of sample size is at least . data of the same case should be collected from different channels so as to do triangle validation to insure reliability and validity (eisenhardt, ) . the data is collected from two channels. the first channel is online and offline interviews with managers of ipes as much as possible, and the data collection based on interviews covers the platforms. we also did fieldwork to access the appropriate data by consulting project research, industry conferences and relationships network. the second channel is to collect secondary data, including case materials from official websites of ipes, annual reports, internal documents and industry research reports. during the data collection process, we did not directly encode the entire sample, instead, we first conducted an exploratory analysis of four cases for two purposes. on the one hand, such a practice helps us to corroborate the rationality of the theoretical model. on the other hand, it can help us to modify and refine the coding measures of the variables. moreover, the contextual variables of the theoretical model are actually progressively explicit in the process of case study. the effect of demand diversity and knowledge complexity on the relationship between the openness and performance is corroborated by the interview material, as shown in table . insert table about here. then three researchers encoded the samples one by one and obtained the coding results as presented in table . data encoding process requires strict cross-validation to ensure research reliability and validity, and data encoding by more researchers can improve research reliability (yin, ). the researchers involved in the coding process have a good theoretical foundation in the fields of tce, rbv and innovation research, which can effectively guarantee the validity of coding. also, back-to-back coding was performed by the three-person coding group to discuss and analyze the inconsistent results. combining the theory with reality, a suitable code value is finally determined, and reliability of coding is guaranteed by cross-validation. the final coding results show that there is a certain degree of difference in the evaluation level of each variable. and the coding results also provide the materials for fsqca. - -------------------------------------------- insert table about here. - -------------------------------------------- calibrating the measures is the first step for the qualitative comparative analysis. the transformation of measures into sets is relatively unproblematic and employs the direct method described by ragin ( ) . the method is using a crossover point as an anchor to calculate the after calibrating the measures, the necessity test has been conducted. if the condition is necessary, it is not suitable for fsqca (ragin, ) . the study uses the consistency in the fsqca to judge the necessary conditions. if the consistency is more than . , the condition is necessary. the results of necessity test show in table . all the consistency values of the conditions are less than , indicating that these condition variables cannot fully explain the result variable, that is, they are not necessary conditions of the result variable. the analysis of single variable is insufficient, and further conditional configuration combination analysis is required (ragin, ) . to sum up, the necessity test of sample has been passed. insert table about here. the next step is to verify the conditions of sufficiency after establishing the necessary conditions. the number of samples covered by the configuration combination is set to . the consistency threshold is set at . , which is recommended by ragin ( ) . the results in table report the causal paths: the combinations of these causal conditions. two of these causal paths are empirically important. empirical importance stems from the degree to which the causal condition or combination of conditions explains the result. two indicators assess empirical importance: consistency and coverage (ragin, ) . in this case, the overall solution consistency is . , and the overall solution coverage is . . the results indicate that the two paths cover most of the outcome. the raw coverage for single causal paths is . and . . on the whole, this research produces two configurations and both of them comply with the consistency threshold. insert table about here. the first configuration in table shows that (a) high supply-side openness, (b) high demand diversity lead to better platform performance. the consistency is very high with satisfactory coverage. thus hypothesis is supported. specifically, the results suggest that the higher the demand diversity of platform users is and the more supply-side openness is, the better the platform performance can be. increasing the platform openness can increase supply of products and services and more effectively match demand, thereby promoting platform performance (ceccagnoli et al., ) . at the same time, competition among users has been strengthened along with the increase in platform openness (casadesus and hałaburda, ). however, our research results show that the demand diversity of users can alleviate the competition effect of supply-side users, thereby improving platform performance. as the interviewee of ipes said, "we are different from the platform with vertical segmentation, such as the platform for snacks and shoes. they may need to reduce the openness and focus on quality control to attract consumers. our position is to build an integrated platform to satisfy diverse demands from different consumers. therefore, we need to increase the supply-side openness and bring in a variety of suppliers, which has become our competitive advantage." the second configuration in table shows that (a) high supply-side openness, (b) high demand-side openness, and (c) high knowledge complexity result in better platform performance. both consistency and coverage is high for this solution. thus hypothesis is supported. with the increase of platform openness, more information and knowledge assets are carried by the users, and the heterogeneity level is increasing, which can promote knowledge innovation activities, thereby promoting platform performance (kogut and zander, ) . at the same time, the increase in openness can lead to the knowledge redundancy and management coordination problems among users. however, our results also show that knowledge complexity in platform operations can reduce the difficulty of managing coordination, thereby improving platform performance. as the interviewee of ipes said, "we are different from product release platform which is simple promotion of mature product. we analyze the demand of users, communicate with developers for preconcert development and link complete value chain operations for the platform of manufacture resources. knowledge innovation is highly complicated. it needs high involvement of users and the supply of diversified knowledge, as well as the abilities of different development agents, to co-create value. this study elaborates that there is a difference between the performance effect of demand-side openness and that of supply-side openness of ipes. though there is a consensus in the literature that a platform can be divided into the demand side and the supply side (gawer and cusumano, ) , the necessity of decomposing the platform openness has not been taken into consideration yet. that is why there is a long existing disagreement on the performance effect of platform openness (boudreau, ; cennamo, ) . as such, it is worth noting that the contents of the two underlying kinds of openness are different. the supply-side openness is the threshold of the platform for product and knowledge suppliers, while the demand-side openness is the threshold of the platform for product and knowledge consumers. high supply-side openness is related to market competition (casadesus and hałaburda, ), while high demand-side openness is associated with user engagement innovation and multilateral value co-creation (von hippel, ; sheth, ) . the study confirms that the configuration of high supply-side openness and high demand-side openness can increase platform performance. consistent with prior studies, the enhancement of openness can bring rapid growth of the platform (boudreau, ). meanwhile, this study calls for the analysis of the contextual conditions on the relationship between openness and performance of the platform. the results suggest that for the transaction attribute of platforms, the impact of platform openness on platform performance lies in the trade-off between transaction matching and competition (felin et al., ) . high demand diversity can reduce the crowding-out influence of supply-side competition which is caused by high openness (boudreau, ) . the study also reveals that the configuration of high supply-side openness and demand-side openness as well as high knowledge complexity can enhance platform performance. existing research pays more attention to the matching of supply-side and demand-side maturity requirements (hagiu, ; fang et al., ) , but neglects the essential role of multilateral knowledge innovation. the results in our research reveal that as regarding to the innovation attribute of platforms, the relationship between platform openness and performance lies in the trade-off between knowledge innovation and management coordination. considering that platform can promote knowledge transfer and creation between supply-side and demand-side users, high knowledge complexity can alleviate the negative effect of information redundancy which is relevant to the multilateral value co-creation in online open community (sheth, ) . especially, this study is different from the prior research on two-sided market which only focuses on the transaction attribute of platforms (hagiu, ) . in contrast, this study integrates the innovation attribute of platforms into the conceptual framework and highlights the importance of knowledge complexity in shaping the relationship between platform openness and platform performance. finally, it is worth noting that platform operations are inevitably influenced by external environment such as the control and support from government (wang et al., ) , because ipes with high supply-side and demand-side openness becomes a platform-based ecosystem (jacobides et al., ) . government agents will intervene when there are fakes in a platform due to lower threshold and less control to the quality of product and service, such as the control practices by the chinese government on the fakes of taobao in the past. on the other hand, government agents will also intervene when there are discriminatory rules on higher threshold in order to ensure fair competition within the platform-based ecosystem (gorwa, ) . although the external environment variables such as government control have not been included explicitly in the model, there is evidence to support such an influence from the interviews and the exploratory case study. future research needs to pay more attention to the impacts of institutional environment factors on platform openness decision (tiwana et al., ) . this study focuses on the relationship between platform openness and platform performance, and decomposes platform openness to supply-side openness and demand-side openness. in addition, the research introduces demand diversity and knowledge complexity as the contextual factors for the performance effect of platform openness. using fsqca, this study examines the effect of platform openness on the level of platform performance. the findings reveal that various combinations of factors including the platform openness, demand diversity and knowledge complexity determine the level of platform performance. specially, this study finds that high demand diversity of platform users and high supply-side openness will lead to better platform performance. in addition, high knowledge complexity required for platform innovation together with high supply-side and demand-side openness will contribute to a high level of platform performance. this study makes several contributions to platform openness decision research, as shown in the table . firstly, this study argues that mixed findings in the literature on performance effect of platform openness partly come from the misunderstanding of the nature of platforms. more specifically, most of the existing studies neglect the ambidexterity attributes of platforms, and only take ipes as either a transaction platform based on two-sided market (hagiu, ) insert table about here. secondly, this study explores the boundary conditions of the performance effect of platform openness. the empirical results reveal that the interaction between demand diversity and supply-side openness can significantly enhance platform performance. higher demand diversity means that there are many niches for suppliers to cater to and it is less of a "winner takes all market" (freeman and hannan, ) . thus, there is an opportunity for many suppliers to flourish by catering to these varied needs and these suppliers are not competing directly. as such, this study is an echo to the call for examining the impact of user heterogeneity in the field of platform research (rietveld and eggers, ) . through investigating the configuration of knowledge complexity together with supply-side and demand-side openness on platform performance, this research also echoes the latest idea in open innovation, that the depth and width of open innovation should be matched with the knowledge content of innovation cooperation (bengtsson et al., ) . last but not the least, this study elaborates platform openness into supply-side openness and demand-side openness because the platform is an architecture connecting suppliers and users (gawer, ) . existing literature mainly focus on the effect of supply-side production and diverse service on platform performance based on transaction attribute of platforms (casadesus and hałaburda, ). the co-creation value of demand-side users is necessary for innovation attribute of platforms (west, ; sheth, j. n. ) . therefore, this study takes both supply-side openness and demand-side openness into consideration. the particularity of ipes lies in the open interface. different from traditional enterprises which only choose their own business boundaries, ipes need to control their ecological boundaries through selecting the levels of platform openness. on the one hand, platform managers need to know that the relationship between openness governance and platform performance is nonlinear. moreover, platform managers need to think about the openness decision as a simultaneous consideration of supply-side openness and demand-side openness. the fundamental guideline for selection is that supply-side openness should pay more attention to the diversity of the transaction of products and services, while demand-side openness should emphasize more on the innovation participation of platform users. on the other hand, the selection of platform openness should involve the consideration of the evolution of platform attributes and should be carried out in a style of rapid iteration. for a platform with more focus on transaction attribute, it is more critical to match high demand diversity with high supply-side openness. for a platform with more focus on the innovation attribute, high knowledge complexity requires both high supply-side openness and high demand-side openness. in addition, the importance of ipes becomes more obvious with the global spread of covid- . without the support of ipes, a large number of transaction and innovation activities cannot even take place. for example, recently our shopping and consumption needs are greatly supported by alibaba and amazon. in addition, "fake news" becomes easier to occur during the epidemic, and consequently tencent and facebook have to think about strengthening platform governance. as such, enterprises probably need to invest in the construction of internet platforms and conduct differentiated governance according to their platform attributes even after the epidemic. on the other hand, government departments can initiate policy supports for platform infrastructure construction, aiming to help enterprises to use new technologies such as g, cloud service and block chain at a low cost to accelerate their digital transformation. this study has several limitations. the future research can be developed from three directions as follows. firstly, external environment variable has not been taken into consideration in the model, such as the impact of government policy and the competition among participating firms. the investigation on the antecedents of platform performance can be extended by including the competitive strategy of ipes and the impact of external institutional environment. for example, jd.com improves the quality of products by taking low openness, while taobao.com satisfies diversified demand of users by taking high openness. in addition, government policy has an important impact on the platform openness decision. platforms have strong market power because of the monopolistic competition market resulted from network effect. in this case, government will set control over the openness strategy taken by the platform once they engage in discriminatory behavior. secondly, large sample analysis with objective data should be taken into consideration in future research. also, the data analyzed in this study comes from encoding method rather than objective data. future studies can consider using objective data to validate the robustness of the research. multiple methods can be used to carry out repeated studies and to test the external validity of research conclusions by increasing the sample size and diversity. the applicability of these research conclusions to other countries and regions can be further validated. lastly, the dynamic evolution of platform openness strategy and platform attributes has not yet been considered. in reality, the development of platforms is a process of dynamic evolution. platform attributes keep changing in different stages. for example, taobao is no longer a simple online trading platform and has gradually evolved into an innovation ecosystem (zeng, ) . therefore, a process model is needed in the future studies to capture the dynamic nature of platform openness strategy. demand-side openness demand diversity . platform on the demand side to enter the user to set a high threshold, a small amount of demand-side users occasionally involved in platform architecture and transaction business custom development. the platform for the demand side users to enter the invitation system, the demand side users do not participate in the platform architecture and transaction business custom development. the platform is a cross-cutting leader with a very diverse range of products and services, with existing users covering all genres (clustering by gender, age and education). platforms are large, cross-domain platforms that offer a wide range of products and services that vary widely from user to user (at least in terms of gender, age, or education). platform is the leading platform in the field of segmentation, products and services are more concentrated, and more users have been more homogeneous. . platform is a professional platform for the field of subdivision, products and services are more single, users have high homogeneity. platform requires participatory innovation among multiple subjects; the knowledge exchanged between subjects cannot be documented; the communication is less accomplished through written documents; and the types of knowledge are almost all skillful and difficult to disassemble. the platform encourages participatory innovation among multiple subjects; the knowledge exchanged between subjects is hard to document; part of communication is difficult to accomplish through written documents; and the type of knowledge is more skillful and difficult to decompose. platform is mainly a multi-factor transaction between elements; the exchange of knowledge between the main body can be documented, communication can be completed through written documents, the type of knowledge is more explicit knowledge, you can modular decomposition. . almost all of the platform is a multi-factor transaction between elements; the exchange of documents between the main body of knowledge; communication through written documents to complete; almost all types of knowledge is explicit knowledge, can be modular decomposition. nona sen, . platform business users a large base (tens of millions), or platform brands in the field in a leading position (top two). platform business users a larger base (million), or platform brands in the field in a leading position (top five). platform business user base is small (one hundred thousand), or platform brand in the field in the middle level (can enter the relevant leaderboards). . platform business users base small ( , and below), or platform brands in the field at a lower level. ulibuy the minister of strategic investment said, "our customers are diverse, with a wide range of ages and income levels, and their needs are even more varied. however, our product channels are limited, and our control thresholds are high, resulting in supply-side products and services cannot meet our customers well." ctrip the business manager said, "our customers' travel needs are diverse, including hotels, airline tickets, tickets, etc. thus, we take a highly open strategy to bring in a large number of external suppliers, so that our platform can achieve better matching and serve users well." high supply-side platform openness matching with demand diversity leads to high performance mooco llege "we are an online education community, and knowledge sharing, and creation requires the participation of both supply-side and demand-side users," said the coo. we thought that the previous strategy may not be appropriate and that we were not open enough to course providers, resulting in the interactions that didn't form well." miui the business manager said, "we built an online developer community, introduced a large number of audiophile users, docked with product developers, promoted user participation in the innovation, and accelerated product iteration with product development companies. this business mode helped us to succeed. high supply-side and demand-side openness matching with knowledge complexity leads to high performance closed or open platform? the nature of platform and a qualitative comparative analysis of the performance effect of platform openness industry platforms and ecosystem innovation balancing platform control and external contribution in thirdparty development: the boundary resources model what is platform governance? information ceo compensation in relation to worker compensation across countries: the configurational impact of country-level institutions interorganizational trust, governance choice, and exchange performance pricing and commitment by two-sided platforms two-sided platforms: product variety and pricing structures the search-transfer problem: the role of weak ties in sharing knowledge across organization subunits. administrative science quarterly governance practices in platform ecosystems: navigating tensions between cocreated value and governance costs social capital, networks, and knowledge transfer towards a theory of ecosystems testing innovation systems theory using qualitative comparative analysis the origins of order: self organization and selection in evolution knowledge of the firm, combinative capabilities, and the replication of technology open for innovation: the role of openness in explaining innovation performance among uk manufacturing firms. strategic management journal platform strategy framework for internet-based service development: case of ebay keeping it all in the family: the role of particularistic relationships in business group performance during institutional transition embracing causal complexity：the emergence of a neo-configurational perspective innovation problems and search for solutions in crowdsourcing platforms:a simulation approach a dynamic theory of organizational knowledge creation. organization science user preferences and strategic interactions in platform ecosystems platform revolution: how networked markets are transforming the economy--and how to make them work for you innovation, openness, and platform control redesigning social inquiry: fuzzy sets and beyond knowledge collaboration between organizations and online communities: the role of open innovation intermediaries demand heterogeneity in platform markets: implications for complementors set-theoretic methods for the social sciences: a guide to qualitative comparative analysis customer value propositions: value co-creation. industrial marketing management. forthcoming the importance of collaborative know-how: an empirical test of the learning organization research commentary-platform evolution: coevolution of platform architecture, governance, and environmental dynamics architectural leverage: putting platforms in context platform ecosystems: aligning architecture, governance, and strategy lead users: a source of novel product concepts sticky information " and the locus of problem solving: implications for innovation regulating platform competition in two-sided markets under the o o era how open is open enough?: melding proprietary and open source platform strategies the economic intstitutions of capitalism collaborative innovation network and knowledge transfer performance: a fsqca approach case study research: design and methods from buzz to bucks: the impact of social media opinions on the locus of innovation internet platform enterprises have the ambidextrous attributes of transaction and innovation this study elaborates the concept of platform openness into supply-side openness and demand-side openness with a thinking that platform is an architecture connecting suppliers and users demand diversity and knowledge complexity as the contextual variables influence the relationship between platform openness and performance this research is supported by zhejiang provincial social science planning foundation (no. key: cord- -l iacaec authors: iwamura, masakazu; inoue, yoshihiko; minatani, kazunori; kise, koichi title: suitable camera and rotation navigation for people with visual impairment on looking for something using object detection technique date: - - journal: computers helping people with special needs doi: . / - - - - _ sha: doc_id: cord_uid: l iacaec for people with visual impairment, smartphone apps that use computer vision techniques to provide visual information have played important roles in supporting their daily lives. however, they can be used under a specific condition only. that is, only when the user knows where the object of interest is. in this paper, we first point out the fact mentioned above by categorizing the tasks that obtain visual information using computer vision techniques. then, in looking for something as a representative task in a category, we argue suitable camera systems and rotation navigation methods. in the latter, we propose novel voice navigation methods. as a result of a user study comprised of seven people with visual impairment, we found that ( ) a camera with a wide field of view such as an omnidirectional camera was preferred, and ( ) users have different preferences in navigation methods. for people with visual impairment, the lack of access to visual information can cause difficulty in their daily lives and decrease independence. to mitigate it, smartphone apps that can tell the user visual information have been developed. vizwiz [ ] and be my eyes [ ] are apps that enable people with visual impairment to ask remote sighted workers or volunteers in supporting them. envision ai [ ], taptapsee [ ] , interest by oneself. let us confirm this. to take a photo of an object, the user has to know where it is. of course, the purpose of using the apps is to know what it is. hence, these apps are used only when "what (it is)" is unknown and "where (it is)" is known. extending this idea, we find "what" and "where" indicate "what it is" and "where it is," respectively. representative task required tools and techniques (i) "what" is unknown. "where" is known. obtaining the visual information on the object that the user photographs current smartphone apps that use computer vision techniques such as [ , , ] can be used. (ii) "what" is known. "where" is unknown. looking for something it is better to use a camera with a wide fov such as a fisheye camera and an omnidirectional camera. (iii) "what" is unknown. "where" is unknown. finding something valuable and unexpected to the user it is better to use a camera with a wide fov, and the information provided to the user should be selected. the following three types of visual information exist, as summarized in table and seeing ai [ ] are apps that use computer vision techniques [ ] to obtain visual information. as of the time of writing this paper, many people with visual impairment use these apps except vizwiz. this paper focuses on the latter approach, i.e., the apps that use computer vision techniques. while it has not been argued before, they can be used under a specific condition only. it is only when the user can photograph the object of interest by oneself. let us confirm this. to take a photo of an object, the user has to know where it is. of course, the purpose of using the apps is to know what it is. hence, these apps are used only when "what (it is)" is unknown and "where (it is)" is known. extending this idea, we find the following three types of visual information exist, as summarized in table . in this category, the user can photograph the object of interest by oneself. this type of visual information can be obtained by the current smartphone apps that use computer vision techniques such as [ , , ]. a representative task of this category is looking for something. that is, the user knows what the user is looking for, but does not know where it is. as the user does not know where the object of interest is, the user cannot use the current smartphone apps in the same way as category (i). it is because the user needs to move the smartphone here and there to take a photo of the object. hence, it is expected that using a camera with a wide field of view (fov), such as a fisheye camera and an omnidirectional camera, is better. as the user already knows what it is, differently from category (i), the app is expected to tell only where it is if found. in this category, the user does not expect that the app will provide any visual information to the user. however, if provided, the information is expected to be valuable to the user. concept-wise, it is similar to the recommendation system used in e-commerce websites such as amazon.com, because it is expected to introduce products that are potentially interesting and unexpected to the user. thus, a representative task is finding something valuable and unexpected to the user. in the real world scenario, the app is required to obtain as much visual information all around the user as possible. hence, similar to category (ii), it is expected that using a camera with a wide fov is better. a big difference from other categories is that the amount of visual information potentially provided by the app can be much. in other words, the app may find multiple objects valuable to the user simultaneously. however, too much information is just annoying. hence, the amount of visual information to be provided to the user must be controlled. among them, we focus on category (ii) and argue looking for something, which is a representative task of the category, in the following two issues. the first issue is about cameras. in the task, we assume the user looks for a designated object around the user using an app that uses a computer vision technique to detect the object and guides the user to reach the target object. as the system needs to capture the object with the camera, the task is expected to become easier by using a camera with a wide fov, such as a fisheye camera and an omnidirectional camera. hence, in a user study, we investigate if our expectation regarding the cameras is correct. the second issue is about rotation navigation methods. in turn-by-turn navigation, ahmetovic et al. [ ] have studied rotation errors and found that the participants tend to over-rotate the turns, on average, • (hereafter, deg.) more than instructed. they have concluded that simply notifying the user when the user reaches the target orientation, like they did in the research, is error prone, and a different interaction, such as continuous feedback, is required. as a follow up, ahmetovic et al. [ ] have investigated three sonification techniques to provide continuous guidance during rotation. however, it is not necessary to instruct by sound. hence, we introduce three voice instructions and investigate the users' preferences in the user study. in looking for something, we implement a computer-vision-based prototype system that guides the user to reach the target object in a step-by-step manner. -step : object detection the system detects an object of the designated category in the captured image. in the user study, we designated easy-to-detect object categories, but only one instance existed in the room, such as a laptop and a bottle. once the object detection method outputs the bounding box of the target object, the direction of the target object from the user is recorded. the user rotates on the spot until the target object comes in front. by comparing the output direction of the electronic compass with the direction of the target object, the system guides the user to rotate using a rotation navigation method. with the guidance of the system, the user advances toward the target object and stops in front of the object. it uses the depth camera to measure the distance to the target object, and speaks the distance periodically, like " . m, . m, ..." it ends when the user reaches a distance of . m. the implemented prototype system consisted of a laptop computer (macbook pro) and a camera system in fig. . as shown in fig. (a), one consisted of an omnidirectional camera (ricoh theta z ) used in step of the above procedure, an electronic compass (freescale mag installed on bbc micro:bit), and a depth camera (intel realsense d ). the electronic compass was used in step to quickly sense the user's direction and promptly give the user feedback. the depth camera was used in step to measure the distance to the target object. the other was a pseudo smartphone shown in fig. (b) . instead of the smartphone's embedded camera, we used a web camera (logicool hd webcam c ) in step . we used the same electronic compass and depth camera for a fair comparison. to detect the target object, we ran a pytorch implementation [ ] of you only look once (yolo) version [ ] , which is a representative object detection method, on the laptop computer. it was trained on coco dataset [ ] consisting of object categories. as the object detection method assumes to input a perspective image, the image captured with the omnidirectional camera was converted to eight perspective images in the same manner as [ ] . the prototype system speaks its current state, like "searching an object. please stay and wait." "detected." "measuring the distance." and "the object exists near you." ahmetovic et al. [ ] have introduced the following three sonification techniques that provide continuous guidance during rotation. intermittent sound (is) triggers impulsive "beeping" sounds at a variable rate, which is inversely proportional to the angular distance, like a geiger-müller counter. amplitude modulation (am) employs a sinusoidal sound, modulated in amplitude by a low frequency (sub-audio) sinusoidal signal. the frequency of the modulating signal is inversely proportional to the angular distance, producing a slowly pulsing sound at large angular distances, which becomes stationary when the target is reached. musical scale (ms) plays eight ascending notes at fixed angular distances while approaching the target angle. they concluded that is and ms when combined with ping (impulsive sound feedback emitted when the target angle is reached) were the best with regard to rotation error and rotation time. we examine the following five (three voice and two sound) navigation methods. left or right (lr) repeatedly (approximately . times per second) tells the direction toward the target object, i.e., "left" or "right." when the target object comes within • . in front of the user, it tells "in front of you." angle (ag) repeatedly tells the relative rotation angle to the target object, followed by "left" or "right." the front of the user is always regarded as • . for example, if the target object exists at an angular distance of • . on the right-hand side of the user, the system speaks " • , right." after the user rotates by • , it speaks " • , right." in front of the target object (within • ), it tells "in front of you." clock position (cp) is similar to ag but uses the clock position. taking the same example as ag, it speaks " o'clock." in front of the target object (within • ), it tells "in front of you." intermittent beep (ib) is similar to is of [ ] . it triggers impulsive "beeping" sounds at a variable rate, which is inversely proportional to the angular distance. the rates in the front ( • ) and back ( • ) were approximately hz and . hz, respectively. ib is designed to use earphones; beeps are played on only the left or right earphone to indicate the rotation direction. when the target object comes within • in front of the user, it plays beeps sounds at a rate of approximately hz on both earphones. pitch (pt) plays sounds with a variable pitch. in our implementation, the front and back pitches were hz and hz (six and three times of c in scientific pitch notation), respectively. in contrast with ms of [ ] that plays eight discrete notes, pt plays continuous notes. same as ib, pt plays sounds on only the left or right earphone to indicate the rotation direction. in front of the target object, pt behaves in the same manner as ib. we performed a user study comprised of seven people with visual impairment. as summarized in table , the participants consisted of four males and three females, ages to . six were totally blind, and one had low vision. the user study consisted of the following four parts. we told the participants that our research topic was looking for something and gave a brief overview of the experiments. we asked the participants about looking for something. this interview was performed for every two persons except a. that is, the interview groups were table . answers of q and q are shown in tables and . the answers of the participants are summarized as follows. five out of seven participants lived together with someone (q ). among them, two lived with sighted persons (q ). while they all looked for something every day (q ), they did not encounter trouble every day (q ). they all looked for something at home, and three did it in other places (office or school, and outside) (q ). they all groped to look for something, expecting to find it in arm's reach, while four asked a sighted person if available (q ). five mostly looked for a smartphone, see table q . how long does it take to find lost stuff? ( : within min., : - min., : more than min.) q . where do you find lost stuff? see table q . what causes you to look for something? while earphones and other stuff were also often looked for (q ). required time to look for lost stuff was of variety (q ). some answered that they gave up looking for if it took more than min. (q ). the lost stuff was found in the pocket of a jacket and a bag, and on a chair and a table (q ). losing stuff was mostly caused by wrongly remembering and forgetting where it was placed (q ). they all answered that their remedy to avoid losing stuff was to fix the place, while two answered to keep the room clean (q ). g in many cases, on the floor. a snapshot during the experiment. the experimenter holding the laptop computer stands behind the participant to prevent the camera from capturing him. in this case, the laptop computer at the bottom was the target object. differently from the pre-study interview, the following two experiments were performed for each participant. in this experiment, we asked participants to use five rotation navigation methods one by one through steps (object detection using the omnidirectional camera) and (rotation navigation) in sect. . . as ib and pt were designed to use earphones, for a fair comparison, participants used earphones for all navigation methods. figure shows how the experiment was performed. table shows their preferences on a -point scale, in which a large number means better. besides, their comments on the five navigation methods and ideas about easy-to-use navigation methods are shown in tables , , , , and . a the resolution in angle was too detailed. i prefer cp. b while the resolution was too detailed, this way was easy to get, as the angle is absolute. c though it seemed not to cause trouble and easy to get, it was not easy for me to imagine how much i should rotate. d if i get used to this way, it would be the safest choice. e i needed to be strategic. it took some time to think about how much i should rotate after hearing the angle. f simple. though i could get the angle, i could not immediately imagine how much i should rotate. i may need to get used to it. g it was easy to get when i needed to stop, as spoken angles were decreasing. a this way was the easiest to get. c this way was easy to imagine both rotation direction and angle. d i need to get used to this way. e as i am used to this way, i could imagine how much i should rotate. but, the resolution in angle maybe too rough. f as i am used to the clock position, this way was very easy to understand, so that i could reach the target direction immediately. g i needed to think about which direction o'clock is. it is because i am not used to it. table shows that the participants' preferences were of variety. that is, all navigation methods except lr were selected as the best by at least one participant. related results are reported in two papers; musical experience affects the users' behavior [ ] ; expertise affects interaction preferences in navigation assistance [ ] . in our experiment, while we did not ask their expertise, from their comments , we can see that the participants have their compatibility with g i had the impression that i was approaching the target. i felt it was trustable. a the pitch of the first sound was too low. to me, the high pitch did not link to getting close to the target. b the pitch of the last sound was too high. compared to ib, it was not easier to expect the target angle. using both ears is negative. c intuitively, it was easy to get, even without hearing a voice. it would be usable in a noisy place. d i like this way, while i think this way requires a sense of pitch. this way is used in a screen reader (nvda). e this way was the best among the five methods. i could get feedback immediately. i could find the target sensuously. f as hearing the sound in either of my ears made me confused, voice navigation was better. i could not imagine how much i should rotate, as i could not see how the pitch became when i approached the target. i could notice that i rotated too much when i heard the sound in the other ear. g though this way was easy to get, i could not distinguish sounds in detail. if it takes long to find the target object, my ears will hurt. navigation methods. these imply that no single best method for everyone exists, and personalization of user interfaces is vital. we also asked the participants if they hesitate to wear earphones on both ears, and found that one (d) did not hesitate, four (a, c, f, and g) did not if they are at home, two (b and e) did. to be strategic, it is better to tell the angle first. then, using a sensuous method such as the duration or interval of vibration. it is also ok that a band wrapped around the belly tells the direction by vibrating the target direction part. f while vibration is a possible solution, i think cp is the best. g voice with vibration would be able to be used in a noisy place. we asked participants to use each of the two camera systems and complete the -step finding process in sect. . . they used the best navigation method selected in experiment for each participant but had the freedom to use or not to use earphones. table shows an omnidirectional camera was preferred by six, while the pseudo smartphone by one. tables and show the participants' comments on the camera systems. six (all but c) commented on the difficulty of using the pseudo smartphone in looking for something. in contrast, they all, including participant c who preferred the pseudo smartphone, found advantages of the omnidirectional camera, while three (a, c, and f) commented its heaviness. hence, we conclude the omnidirectional camera has advantages in the task. in this paper, we focused on apps that use computer vision techniques to provide visual information. we pointed out that the current smartphone apps can only be used under a specific condition, and categorized the tasks of obtaining visual information into three. as a representative task of a category, we focused on table . evaluation of a camera on a -point scale. a it was a bit heavy. if it was not heavy, it was the most convenient. b to find the object, i did not have to rotate. even so, i would buy the smartphone. c while it seems convenient and i think it can find the stuff quicker, requiring a particular device (i.e., omnidirectional) and its heaviness were negative points. d it was unexpectedly good, as it told me how far in the angular distance to the object, and found the object earlier than the smartphone. e it was good, as it told me the direction of the target object. f it was heavy. finding the bottle took time more than i expected. while it found the laptop computer quickly, the "front" the system said was slightly different from my real front. g it was convenient, as i did not have to rotate. it was more accurate and quicker than i expected. i want to use this. the distance to the object was not important. a if i have to find the object by moving the smartphone, i prefer to grope. the response was slow. quicker is better. b i had to rotate to find the object. even if the system did not find the object, i could not judge if it exists in the room (the omnidirectional camera is the same). c an advantage is easy to introduce, as i can use my smartphone. easy to hold. d i expected the smartphone was better. however, i needed to adjust the angle. e the system could not find the object unless it captures it, which frustrated me. as i could not see how quickly the system processed an image, i could not see how fast i could rotate. while it was faster than groping, it took time. f while the camera was not heavy, it is not suitable for looking for something. in real use, if i can roughly guess the direction of the object, i may be able to use this. if not, groping is better. g it was hard to capture the target object, as i needed to take care of horizontal rotation and vertical rotation. i prefer to grope. looking for something. in the task, we proposed a prototype system that used an omnidirectional camera and the use of voice in rotation navigation. a user study comprised of seven people with visual impairment confirmed that ( ) a camera with a wide fov is better in such a task, and ( ) users have different preferences in rotation navigation. the latter implies that no single best method for everyone exists, and it is vital to personalize user interfaces. sonification of rotation instructions to support navigation of people with visual impairment impact of expertise on interaction preferences for navigation assistance of visually impaired individuals turn right: analysis of rotation errors in turn-by-turn navigation for individuals with visual impairments vizwiz: nearly real-time answers to visual questions visphoto: photography for people with visual impairment as post-production of omni-directional camera image computer vision for assistive technologies microsoft coco: common objects in context yolov : an incremental improvement open access this chapter is licensed under the terms of the creative commons key: cord- -dckqb er authors: murillo-morales, tomas; heumader, peter; miesenberger, klaus title: automatic assistance to cognitive disabled web users via reinforcement learning on the browser date: - - journal: computers helping people with special needs doi: . / - - - - _ sha: doc_id: cord_uid: dckqb er this paper introduces a proof of concept software reasoner that aims to detect whether an individual user is in need of cognitive assistance during a typical web browsing session. the implemented reasoner is part of the easy reading browser extension for firefox. it aims to infer the user’s current cognitive state by collecting and analyzing user’s physiological data in real time, such as eye tracking, heart beat rate and variability, and blink rate. in addition, when the reasoner determines that the user is in need of help it automatically triggers a support tool appropriate for the individual user and web content being consumed. by framing the problem as a markov decision process, typical policy control methods found in the reinforcement learning literature, such as q-learning, can be employed to tackle the learning problem. accessibility to the digital world, including the web, is increasingly important to enable people with disabilities to carry out normal lives in the information society, something that has been acknowledged by the united nations and many individual governments to be a right for people with disabilities. this is as true for people with cognitive, language, and learning differences and limitations as it is for anyone else [ ] . nowadays, many web users suffering from a cognitive or learning disability struggle to understand and navigate web content in its original form because of the design choices of content providers [ ] . therefore, web content often ought to be adapted to the individual needs of the reader. currently available software tools for cognitive accessibility of web content include immersive reader [ ] , the read&write browser extension [ ] , and easy reading [ ] . these tools embed alternative easy-to-read or clarified content directly into the original web document being visited when the user requests it, thereby enabling persons with a cognitive disability to independently browse the web. access methods may be tailored to the specific users based on personal data, generally created by supporting staff or educators [ ] . besides these semi-automatic tools, current approaches to making websites accessible to people with cognitive and learning impairments still mostly rely on manual adaptations performed by human experts [ ] . the easy reading framework improves cognitive accessibility of original websites by providing real time personalization through annotation (using e.g. symbol, pictures, videos), adaptation (e.g. by altering the layout or structure of a website) and translation (using e.g. easy-to-read, plain language, or symbol writing systems) [ ] . the main advantage of the easy reading framework over existing cognitive support methods is that the personalized support tools are provided at the original websites in an automatic fashion instead of depending on separate user experiences which are commonly provided to users in a static, content-dependent manner and that must be manually authored by experts. easy reading software clients have been designed as web browser extensions (for mozilla firefox and google chrome) and mobile os apps (android and ios). the main interaction mechanism between the user and the client consist on a graphical user interface (gui) that the user may choose to overlay on top of any website being currently visited. a number of tools, personalized to the specific user, are available to the user in easy reading's gui (see fig. ). the user may choose at any time to use some of the available framework functions by triggering their corresponding tool by clicking on the available buttons of the gui. given the special needs of easy reading's user base, having a traditional gui as the only interaction mechanism between the user and the browser extension may not suit the specific needs of all users. some users, especially those suffering from a profound cognitive disability, may not possess the necessary expertise and/or understanding to interact with easy reading's gui. this is particularly the case if there are many tools being overlaid on the gui, as this may overwhelm the user given the considerable amount of personalization mechanisms to choose from. the use of easy reading is also restricted for those suffering from additional physical disabilities making interaction slow or impossible when no easy to use at solutions are at hand. we therefore aim to assist the user in choosing and using the right cognitive support tool when he or she is in need of help while navigating web content which appears to be confusing or unclear. we have expanded the easy reading framework so that it supports the automatic triggering of any support tool with the addition of two components; namely, ( ) a user data collection module and ( ) a client-based reasoner that learns about the mental state of the user based on the gathered data and previous experiences, and reacts accordingly by triggering support tools when necessary. figure displays the interaction between these two components within the easy reading framework. the next section gives a short overview on current methods for automatically detecting the cognitive load/affect of a person from collected user data. based on some of these results, the design of the easy reading user tracking and reasoning framework is outlined in the remaining of this document. affect recognition is the signal and pattern recognition problem that aims to detect the affective state of a person based on observables, with the goal of, for example, providing reasoning for decision making or supporting mental well-being [ ] . terms such as affect and mood elude a precise definition in the literature, but some working definitions may be characterized. namely, affect is a neurophysiological state that is consciously accessible as the simplest raw, nonreflective, primitive feeling evident in mood and emotions e.g. the feeling of being scared while watching a scary movie [ ] . the easy reading graphical user interface (gui) overlaid on a website. the symbol support tool has been automatically triggered on a text paragraph by the easy reading reasoner, adapting its content automatically with symbol annotations over the original text. the user may reject automatically given help by means of an onscreen dialogue (top right). any of the available tools on the easy reading gui may be also manually triggered by the user at any given time by clicking on its corresponding button. on the other hand, emotions are intense and directed indicators of affect e.g. shock and scream are emotions that indicate the affect of being scared [ ] . as opposed to emotions, moods are less intense, more diffuse, and last for a longer period of time than emotions [ ] . for example, the emotion of anger, which does not last long by itself, can lead to an irritable mood [ ] . on this paper we focus on the binary classification problem of the user's affect, namely, whether the user is in a confused mental state during a web browsing activity. this problem is closely related to that of stress detection, in which data is analyzed to predict the stress level of a person, generally as a binary variable (stressed/unstressed). stress detection is a well-researched problem that can be reliably undertaken by analyzing user physiological signals provided by e.g. a wrist-worn device such as a smartwatch [ ] . affect is a subjective experience of a person which is generally detected through self-reporting. nevertheless, numerous approaches that aim to infer a person's affective state in an automatic fashion can be found in the literature. these approaches can be divided into four categories depending on the kind of data they process: • contextual approaches learn from the interaction between the user and a software system by e.g. analyzing mouse gestures or page visit times. • physiological approaches collect and analyze physiological data from the user, such as heart beat rate or skin temperature. • text-based approaches process and interpret the textual contents of speech spoken or written by the user using natural language processing (nlp) techniques for sentiment analysis generally based on supervised machine learning methods. • audio-visual approaches study recorded audio (generally speech) or video (of e.g. the user's face or full body) while the user interacts with the system. preferably, affect recognition systems should employ multimodal data i.e. fusion analysis of more than one input modality, since multimodal affect recognition system are consistently more accurate than unimodal methods [ ] . a collection of state-of-theart methods for affect recognition can be found in [ , , ] . the vast majority of these methods rely on supervised machine learning models such as deep neural networks (dnn) for image analysis e.g. for analyzing the user's facial expressions; or random forests (rf) and support vector machines (svm) for analysis of physiological signals e.g. heart rate variability. what these methods have in common is that they require of big amounts of training data that the learning model must be trained on. available datasets such as the well-known deap dataset [ ] aim to simplify this process by providing a large amount of pre-labelled training data for affect recognition tasks. for a list of available datasets the reader is directed to [ ] and [ ] . however, these approaches, especially those relying on physiological signals, suffer from a number of drawbacks that hinder their application in practice: • even if supervised models perform well when tested on known users, they exhibit high generalization errors when tested on unknown users, and thus models must be fine-tuned to the specific user [ ] . • available datasets have been collected using a specific combination of devices and sensors e.g. a specific wristband. therefore, end users are forced to acquire a very similar combination of devices to make use of models trained on such datasets. preferably, the reasoner model should adapt to the available hardware, not the other way around. • many tracking devices employed to collect data for these datasets are too expensive or obtrusive to be used in an informal home/office setting by end users, such as eeg headsets. this section introduces our approach to automatic affect detection tailored to the specific user and available devices that aims to overcome some of the issues described in the previous section. in order to detect whether the user is confused or mentally overwhelmed by the content he or she is visiting during an ordinary web browsing activity, user data needs to be collected in a transparent, unobtrusive manner to the user. in addition, specific tracking devices whose presence in a common household or office environment would normally be unwarranted (due to e.g. high cost) ought to be avoided. therefore, after a study of the relevant literature and filtering out those tracking devices which did not satisfy these requirements, the following signals were considered: • eye movement and position. the current position of the user's gaze on the screen and the voluntary or involuntary movement of the eyes can be collected with the use of inexpensive eye trackers, commonly used in gaming, that can be mounted near a computer's screen in close distance to the user. eye movement data is of great utility to ascertain cognitive load. some authors even argue that eye movement data suffices to infer the cognitive demand of tasks being carried out by a person [ ] . • blink rate. the time period between two or more consecutive eye blinks can be a good indicator of task difficulty as well. for example, a poor understanding of the subject matter in a lecture on mathematics resulted, for some persons, on an increased number of rapid serial blinks [ ] . blink frequency can be easily measured by, for example, analysing a video of the user's face recorded with a typical laptop webcam. • heart rate. the current heart rate (hr) of the user, measured in beats per minute (bpm), and especially heart rate variability (hrv), which describes the variation of the time between heartbeats, is a rather simple but effective measure of the current affective state and stress level of the user [ ] . these dimensions can be easily determined with the use of commercial smartwatches and fitness trackers, which are the most popular wearable devices being sold nowadays. • implicit behavioural information. several measures of user behaviour on websites that aim to predict disorientation, task difficulty and user preferences can be found in the information retrieval literature. for example, time spent on site and clickthrough rates have been used to measure the cognitive load of users visiting a web search engine [ ] . it is however important to note that other studies have concluded that user disorientation on websites is only weakly related to user behaviour [ ] . therefore, physiological signals are employed as the main data source employed by easy reading's reasoner module. user data are processed and gathered by the easy reading framework as follows. eye fixation duration (in milliseconds) and current x and y coordinates of the user gaze on the screen is measured by a tobii c eye tracker . to measure hr and hrv, the vivoactive smartwatch by garmin was selected as a good compromise between accuracy and affordability. blink rate can be measured from video data recorded from any standard webcam, whether integrated in a laptop or an external one. input signals are gathered and processed in an easy, flexible, and tailorable manner by means of an asterics model. asterics [ ] is an accessible technology (at) construction set that provides plug-ins for many common input devices and signal processing operations. by combining already existing and newly developed plug-ins into an asterics model, raw input physiological signals are pre-processed before being sent to the reasoning module. pre-processing includes methods for synchronization of data streams, handling of missing values (e.g. if the user does not possess some of the input devices), noise removal, and segmentation of the collected data into batches. several pre-processing parameters can be adjusted directly in the asterics model by end-users or carers without the need of possessing technical skills. for example, batch (temporal window) size is by default set to s (e.g. samples aggregated after s each) following state-of-the-art recommendations [ ] , but can be easily adjusted by modifying the relevant parameters of the easy reading asterics data collector plug-in. collected batches are next converted to json objects and sent to the easy reading browser extension via a secure websocket connection maintained by the asterics runtime environment (are) web server. the easy reading reasoner is the client-based module in charge of solving the problem of inferring the affective state of the user from the current readings of physiological signals collected by a running asterics model. the reasoner is hosted on the client in order to minimize the amount of messaging needed between the distributed components of the user tracking and reasoning framework, which in turn results in more responsive reasoner actions. this however comes at the cost of a more limited computational capacity, as the whole learning model has to run on the user's browser. we have adopted a markov decision process (mdp) as the framework for the problem, which allows it to be theoretically solved using a number of well-established control learning methods in the reinforcement learning (rl) literature. as previously stated, research shows that affection/stress recognition methods must be tailored to the individual differences of each person. given that rl is specially well suited to problems in which the only way to learn about an environment is to interact with it, we model the environment shown in fig. as a mdp to be solved for a specific user. for a detailed characterization of mdps and rl, the reader is directed to [ ] , chapters and . like every mdp, our problem consists of an agent, (intractable) environment, state set (s), action set (a), and policy (p), characterized as follows. the agent in a mdp is the learner and decision maker. it corresponds to the reasoner module being executed in the background script of the browser extension, as shown in fig. . at any given time step, t, the reasoner observes the current state of the user, s t , as specified by each sample being delivered by the asterics model, and decides on an action, a t , to be taken with probability p i.e. p a t js t ð Þ ¼ p. the current status, s t , is the json object produced by the data collector model, which consists on a number of features e.g. hrv, and the current value of the user readings for that feature. note that t does not correspond to an exact moment in time, but rather to the latest time window that has been aggregated by the data collector. depending on the feature, the data collector sets its value to the latest received input or an aggregation thereof, such as the average or most common value (mode) during the time window. the reasoner may next take one of three actions (a t ), namely: . no action (nop). the reasoner has inferred that the user is not in need of help at the moment, and thus no further action is necessary. . help user (help). the reasoner has inferred that the user is in need of help with some content of the website being currently visited, and a suitable framework tool needs to be triggered. figure displays an example of this action being triggered on a website. . ask user explicitly for the next action to take (ask). the reasoner is unsure about the next action to take, as it expects both nop and help actions to yield a low reward. in this case, it asks the user, via an onscreen dialogue, about which of these two actions to take next. the user gives feedback, which may be implicit or explicit, on the action just taken by the reasoner and a numeric reward, r t þ , is computed as a function of a t and the user feedback, as shown in table . this reward function heavily penalizes the case in which the agents fails to help a user in need. however, to prevent the agent from persistently asking the user for explicit feedback on the best action to take, asking the user is always given a (low) negative reward as well. moreover, since the correct s t ; a t ð Þpair is known after the user has answered a feedback dialogue, this combination is accordingly rewarded in order to speed learning up. the agent is only positively rewarded in the case that it manages to predict that the user is currently in need of help and automatically triggers the correct tool for his or her needs. currently, the support tool triggered by the reasoner is the most frequently used tool by the user for the content type (text or image) that has been stared at the longest during the last time window t. use frequency for each tool is computed and stored in each user's profile on easy reading's cloud backend. in the subsequent time step, t þ , the agent receives, along with the reward signal r t þ , a new user state, s t þ . this state is an aggregation of user data representing the user's reaction to a t . therefore, the easy reading extension does not start collecting a new state until the user's feedback has been gathered. consequently, any user data incoming during the processing of s t , and after a t has been yielded but before the user feedback has been inferred, is discarded. this whole process is summarized in the workflow diagram shown in fig. . the goal of the agent (reasoner) is to learn which sequence of actions leads to a maximum reward in the long run. this behavior is encoded in a so-called policy, which the reasoner has to learn. the policy, p ajs ð Þ, specifies which action to take on a given state, or, in the nondeterministic case, the probability of each action of the action space for a given state. the easy reading reasoner keeps an estimation of the value, in terms of future expected returns, of each action a performed on each state s it has seen so far, q p s; a ð Þ, known as the action-value function of p. this function can be stored as a table in memory if the set state is small. in this case, input observables are further pre-processed in the browser extension via data binning to obtain a manageable state set. otherwise, the state-value function can be approximated via function approximation methods e.g. by modelling it with a neural network (nn). the latter approach is possible by defining actor and/or critic nns with tensorflow.js directly on the user's browser . note however that due to the strict security policy of the firefox add-on store this possibility cannot be included in officially listed extensions, and therefore our extension currently only implements a tabular q-function. the policy that maximizes q p Ã s; a ð Þ, for all states and actions is known as the optimal policy, p Ã . some rl control methods, such as temporal-difference (td) learning methods, converge to the optimal policy given enough training time. the easy reading reasoner implements a number of value-based rl methods that aim to find p Ã ; namely, q-learning and double-q-learning. describing these methods is out of scope of this paper, for further information the reader is directed to [ ] . the basic q-learning update rule is shown in context in fig. . when interacting with a new user, the agent does not know anything about the environment (the user), and therefore it has to be explored. once enough knowledge is acquired, this knowledge can be exploited to maximize the rewards obtained from this point onwards. the agent follows a e-greedy behavioral policy with respect to q p s; a ð Þ, whose values are initialized to zero. however, instead of choosing a random action with e probability, it chooses the "no action" action both at exploration time and when state-action values are tied at exploitation time. this way, since most of the time the user will not be in a confused state, help tools and dialogues are less likely to come up at unsought times, which aims to reduce user frustration overall. the system can then slowly learn to identify the states in which the user needs help as tools are manually triggered during training sessions with a caregiver where negative feedback is acquired. the implemented policy's pseudocode is shown in context in fig. . this article has introduced an innovative approach to automatically detecting the affective state of a web user in real time based on the analysis of physiological signals. the main goal of the easy reading reasoner is to infer the current cognitive load of the user in order to automatically trigger the corresponding assistance mechanism of the easy reading framework that would help the user in getting a better understanding of a difficult piece of web content (text or image). it is included in the latest version of the easy reading extension as a proof of concept and is ready to be tested with voluntary participants. training sessions with synthetic data have been carried out, yielding a very good accuracy of around % after time steps i.e. around h of real training time. however, detecting user confusion on actual users may prove much more challenging, since changes in physiological signals may be too subtle or complex to be properly modelled by the easy reading reasoner and the inexpensive consumer devices employed. it must be noted that initial informal evaluation with end users has shown that triggering support tools when the user does not need them should be avoided altogether, since it frustrates and confuses them to the point where they refuse to keep using the easy reading extension with reasoner support enabled. after this observation, we modified the agent's policy from a traditional e-greedy policy to the modified policy shown in fig. . the next step is to test our approach with end users in a laboratory setting. a review and meta-analysis of multimodal affect detection systems implicit measures of lostness and success in web navigation the easyreading framework -keep the user at the digital original deap: a database for emotion analysis using physiological signals policy and standards on web accessibility for cognitive and learning disabilities many facets of sentiment analysis tools and applications for cognitive accessibility importance of individual differences in physiological-based stress recognition models rapid serial blinks: an index of temporally increased cognitive load asterics, a flexible assistive technology construction set a review of affective computing: from unimodal analysis to multimodal fusion read&write literacy support software. texthelp wearable affect and stress recognition: a review detecting task demand via an eye tracking machine learning system continuous stress detection using the sensors of commercial smartwatch reinforcement learning: an introduction, nd edn. a bradford book ), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the creative commons licence and indicate if changes were made. the images or other third party material in this chapter are included in the chapter's creative commons licence, unless indicated otherwise in a credit line to the material. if material is not included in the chapter's creative commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use acknowledgments. this project has received funding from the european research council (erc) under the european union's horizon research and innovation programme (grant agreement no. ). key: cord- - fhb m authors: hashemian, mohammad r. title: advanced querying features for disease surveillance systems date: - - journal: online j public health inform doi: . /ojphi.v i . sha: doc_id: cord_uid: fhb m most automated disease surveillance systems notify users of increases in the prevalence of reports in syndrome categories and allow users to view patient level data related to those increases. occasionally, a more dynamic level of control is required to properly detect an emerging disease in a community. dynamic querying features are invaluable when using existing surveillance systems to investigate outbreaks of newly emergent diseases or to identify cases of reportable diseases within data being captured for surveillance. the objective of the advance querying tool (aqt) is to build a more flexible query interface for most web-based disease surveillance systems. this interface allows users to define and build their query as if they were writing a logical expression for a mathematical computation. the aqt allows users to develop, investigate, save, and share complex case definitions. it provides a flexible interface that accommodates both advanced and novice users, checks the validity of the expression as it is built, and marks errors for users. in its annual report, the world health organization warned of the increased rate at which diseases spread in a world where billion people travel by air [ ] . the early detection of known and emerging illnesses is becoming more important. automated disease surveillance systems have been in existence for over years [ ] [ ] [ ] . most of these systems analyze data by syndrome and search for disease outbreaks. a syndrome in this context is defined as a group of diseases related in some fashion, such as respiratory diseases. this level of investigation is often sufficient, but a more dynamic level of control may be required to understand an emerging illness in a community. for example, during the - severe acute respiratory syndrome (sars) disease epidemic [ ] , the respiratory syndrome definition used by most automated disease surveillance systems was too broad to track sars [ ] . in this case, the users needed to create queries that looked for specific keywords in the patient chief complaint or specific combinations of icd- codes [ ] . a chief complaint is text entered by a triage professional in an emergency room or a clinic, based on a patient's description of their primary symptoms. today's public health departments must deal with a multitude of data coming from a variety of sources. for example, electronic medical record (emr) data include sources such as radiology, laboratory, and pharmacy data. a more sophisticated querying tool is needed to assist investigators with creating inquiries across multiple data sources [ ] [ ] [ ] . currently, there are surveillance systems, such as the electronic surveillance system for the early notification of community-based epidemics (essence) [ ] , which provide limited dynamic querying capability. however, we wanted to design a flexible and simple graphical user interface (gui) for this and other types of surveillance systems. our prototype system, the advanced querying tool (aqt), allows the investigators to handle complex cases where one can incorporate any data elements available in a disease surveillance system, then mix and match these data elements in order to define valid queries. hence, this system removes the need for database administrators and application developers to define pre-packaged database queries and user interfaces every time a new and innovative query is written. as an example, investigating a potential influenza outbreak in an adult population may require respiratory syndrome queries only, while investigating a similar outbreak in children under years old may involve queries in both gastrointestinal and respiratory syndromes ( figure ). table provides examples of how a dynamic query tool exploits combinations of data elements available to disease surveillance systems. most automated disease surveillance systems have a fixed number of predefined syndromes. these applications severely limit the surveillance system value for diseases that fall outside of its broad syndrome categories. the background noise level rises when all the chief complaints that potentially fall into a syndrome category are included, which in turn requires many more positive cases to identify an abnormal condition. merely adding sub-syndrome categories, that are more granular than syndromes and cover a broader range of conditions than typical syndromic surveillance like injures and chronic disease [ ] , provides the users with a more comprehensive means to filter the analysis window. if a disease surveillance system has sub-syndromes, then taken singly the user has additional choices; by combining two or three sub-syndromes, the analysis options are magnified to over ten million choices. of course not all of these options are sensible, so the actual number of options is somewhat less. even greater analytic flexibility is provided through the use of data elements contained within electronic medical records. the capability to select a combination of a microbiology laboratory result, radiology result, and icd- code provides for a powerful tool that enables the public health community to rapidly identify specific high risk patients. the following objectives summarize the design features of the aqt: the tool's interface will help generate queries that can process any kind of data regardless of its source (e.g., emergency room visit, office visit, pharmacy, and laboratory). unlike fixed-form query interfaces, aqt will not restrict users in what they can query. instead, the user will be able to formulate ad-hoc queries across assorted data sources without the need to understand the underlying data models and the query languages associated with different systems. in addition, using this tool should save investigators' valuable time in obtaining the query results. currently, if the surveillance system cannot generate the desired queries, the application developers and/or database administrators may have to create new interfaces or functionalities. the aqt, however, empowers the users to move forward with their research without waiting for developer or administrator modifications to the surveillance systems. the interface will accommodate users with different levels of experience in creating complex and valid queries. the process will be natural and follow the same patterns that one uses to express a mathematical equation. at the same time, it will give the more experienced users, who are familiar with the data elements, the freedom to define complex queries by sidestepping the guiding tools. the advanced users will have the ability to type in their queries and the tool will validate them and provide feedback on possible syntax errors. the interface will allow users to save and share queries with other public health professionals, even in different jurisdictions. after defining a complex query the user has the ability to store the query for future investigations. one should be able to execute the stored query repeatedly in the future, include it as a segment of a bigger query, or customize and execute it. these saved queries can then be shared as part of collaborative efforts among users in different departments and jurisdictions. aqt will provide an interface for disease surveillance systems to store, retrieve, and share queries. these capabilities are especially valuable for users employing a case definition for following a disease outbreak. a case definition is a set of symptoms, signs, etc., by which public health professionals define those patients considered to be directly a part of the disease outbreak. finally, the tool should be self-contained and generic. this allows most web-based disease surveillance systems to incorporate the aqt into their systems. the entire functionality of the tool is placed within a single web page ( figure ). the screen in figure is divided into major sections. starting at the top, the user can filter the data by picking the data source from a dropdown list, start and end date. the surveillance system should supply this list of data sources to the aqt. the next area below is the message area where the gui communicates with the user. any information, warnings, or error messages are displayed in this section. the next area, the query section, contains the query expression. the users can either directly type the query expression or use the tool to generate the query expression and paste it in this area. alternatively, they can use a combination of the two methods by typing part of the expression and pasting the rest using the query builder. the query section is followed by the query builder section where the tool provides list boxes, buttons, etc., to direct the user through the process of generating the query expression. the bottom section is where an action on the query is performed. users can validate the expression's syntax, save the query for their own future use, save it to be shared with others in the user community, clear the query expression and start over, or simply execute the query and get the results. as mentioned earlier, the capability to generate queries on data from a variety of sources is one of the objectives of the aqt. each data source has its own distinctive set of data elements. the interface has to provide a list of data elements pertaining to the chosen data source. for example, the data might represent different geographic regions from one data source to the other. that is, one source might have data identified by zip codes while another source uses some other type of defined region such as hospitals, pharmacies, and schools. another area where data sources can be different is in medical groupings. for example, office visits often use icd- codes [ ] , while emergency departments use patient chief complaints. the interface is designed to distinguish valid data elements for each data source and populate the data element list box accordingly. after selecting a data source the tool populates a list box with a set of associated data elements for the data source. the list box is divided into three major areas: • the geography system • the medical grouping system • others such as age, sex, saved and shared queries. figure shows how the medical grouping systems differ for emergency room (right) and over the counter (left) data sources. as mentioned earlier, a main objective of the aqt is to provide an interface that caters to both novice and experienced users. the experienced users simply type the query, while beginners and those who are more comfortable with a guided interface can use list boxes and buttons to generate the queries. in fact, one can type part of the query and use the tool to generate the rest of the query (figure ) . when a user types a query directly, it is assumed that the user knows the syntax and valid data elements pertaining to the data source, though the tool does check the syntax and provide feedback. because we want the users to define and build their query as if they were writing a logical expression for a mathematical computation, the syntax is simple and close to the "where" clause of a structure query language (sql) statement. however, one does not need to know sql to write the expressions. a query consists of one or more simple expressions joined by "and" and/or "or," negated by "not," and grouped by parentheses. a simple expression is enclosed within square brackets ([]) and defined by a variable, a logical operator, and a value. for example, if an investigator is searching for reported fever cases within a specified zip code, the query then consists of two simple expressions; one which searches for the specified zip code and the other which checks the fever syndrome. the final query may look like the expression below: if the investigators want to narrow the search into a certain age group they can type or use the tool to add and [age = " - "] to the above expression. hence, the users can add more conditions without worrying about the underlying data model. the most complex part of the syntax occurs when searching for values that contain, start with, or end with a set of characters ( figure ). in this case, the syntax uses "*" as the wildcard character. for example, a user would type [chief-complaints = "*head*"] in the query box if he/she is looking for all the records of chief-complaints that include the word "head." similarly, if a user types [chief-complaints = "head*"] or generates it using the tool (selects the starts with from the operator list box and types head in the text field), the resulting query would search for all the records where the chief-complaints field begins with the word "head." the procedure for generating expressions follows the same pattern a person would use to create a logical expression. the interface will provide a natural flow to help the users to create an expression as if they are typing it. they may start with selecting a data element or variable such as 'sex', then a logical operator like '=', and finally a value like 'male' or 'female'. the user can add 'and' or 'or' and create the next expression using this same process. the user can interject expressions in the middle of a query, remove parts of the query, or undo the last change made to the query. as changes are being made, the tool validates the entire query in the background and provides instant feedback. this method of constructing queries is more intuitive to the users than that of creating all the individual expressions first and then joining them together. once the data source is selected, a list of core data elements is provided in a list box. from the list box the user can select a data element. based on the type of the data element, a list of valid logical operators for that data element is placed in another list box. figure shows the list of valid operators for text fields. in cases such as zip code and syndrome, '=' and '<>' operators are also valid. for age the operators '>', '<', '<=', and '>=' are added to the list. once the user selects a data element, a list of valid values pertaining to the data element is listed in yet another list box. the user can select one or more of these values, and if more than one value is selected the user can choose to group these values using 'and' or 'or'. note that the aqt generates the expression in a left to right progression in the same manner as one typing the expression (figure ). the next step is to add this expression to the query. by clicking on the "add expression" button, the expression is pasted at the cursor location in the query area. one can add more expressions to this query by clicking and or or buttons and following the same process ( figure ). the aqt helps users quickly identify limits for variables with large sets of values. because data elements such as zip codes and icd- codes have a lot of values for dropdown lists, finding a particular value in these list boxes is very cumbersome. the tool provides an intermediate step for filtering these options into a more manageable list (figure ). for example, if the investigators are interested in data from certain zip codes in a state, they can reduce the options by typing the first two digits of the zip code and thereby filtering the list. the tool will generate valid expressions and provide a mechanism to check the query expressions when a user types parts or all of them. every time an expression is generated by the tool and the add expression button is clicked, the tool examines the entire query expression, checking it against the syntax rules. before saving or executing the expression the aqt automatically checks the syntax and if it detects any syntax errors it will provide meaningful error messages in the message area ( figure ) . additionally, at any point the user can click on the validate button and check the syntax. frequently, investigators want to execute a query over time, run the same query with different values, or use the query inside more complex queries. similarly as all the other data elements (zip code, syndrome, region, etc.), the permanent storage and retrieval of queries (file system, database, or any other mechanism) are the responsibility of the disease surveillance system. the aqt is merely an interface to assist the investigators with their research by hiding the complexity and inner workings of the underlying data model. once the users define the desired query they can click on [save public expression] or [save private expression] buttons. if the query is valid, the screen provides an area to enter a unique name for the query (figure ). if the query is successfully validated the aqt passes the name and query expression to the surveillance system. it is the surveillance system's responsibility to confirm that the query's name is unique and provide feedback to the aqt the success or failure of the save operation. based on the feedback received the aqt provides an appropriate message in the message area. in a collaborative environment users would like to share their findings and queries with others. providing the capability to save and share the queries for collaborative use enables others in the user community to run these queries as they are or to make the modifications necessary to help with their own investigations. the aqt facilitates saving public queries by providing an interface similar to saving private queries ( figure ). the surveillance system should implement the inner workings of the permanent storage and retrieval of public queries. the next step is retrieving these saved queries. there are two options in the data element list box in the query builder section of the aqt: one option is for retrieving the private saved queries, and the other option is for retrieving public saved queries ( figure ). upon selection of either one, a list of corresponding queries will be presented to the users. this list includes the text of the query and the unique name given to that query. by clicking on the query name the saved query will be added to the expression in the query area. at this point users can add more conditions to the same query, such as specifying a zip code, changing the value for age, etc. the final objective of this project is for the aqt to have the capability to be used with most web-based surveillance systems. one can think of the aqt as a widget, or an add-on with some defined interfaces. the back end can be implemented in a variety of popular technologies such as .net, java servlet, or any other server technology as long as it can communicate via an http protocol. the surveillance system has to provide the interfaces that supply values for the different parts of the screen, and the functionality to parse the final query text and run it against the underlying database. making the tool adaptable to many web-based systems requires the aqt to contain all the processing dynamically, including validating the query syntax and changing the contents of the list boxes. in a web-based environment, this means using browser components such as html, cascading style sheets (css) [ ] , javascript, and the document object model (dom) [ ] to implement application logic. in developing aqt, we utilized html, javascript, and ajax (asynchronous javascript and xml) and placed all the processing on the local machine to avoid any server dependency. we used javascript to apply validation, data handling, and screen processing on the browser side, and ajax for communicating with server applications. ajax is used for creating interactive web applications and is a cross-platform technique usable on many different operating systems, computer architectures, and web browsers, because it is based on open standards such as javascript and xml. the intent of this technique is to make web pages more responsive by exchanging small amounts of data with the server behind the scenes, so that the entire web page does not have to be reloaded each time the user requests a change. this feature increases the web page's interactivity, speed, functionality, and usability. ajax is asynchronous in that loading does not interfere with normal page loading. the aqt uses ajax calls to obtain required data for populating the different list boxes on the screen. for example, when the user selects a data source the tool calls the surveillance system, passes the selected data source, gets a list of data elements from the server (the surveillance system), and then populates the data element list box. the communication to the server is done by an ajax call, and the javascript processes the returned data and populates the list. essence has been one of the early adaptors of aqt. although the capability to create efficient custom queries for emergency room chief complaints data existed prior to the aqt, the query building process was cumbersome and not user-friendly. it was easy to make syntax errors while typing a query, and there was no mechanism to validate the logic of the query statement. furthermore, while "and" and "or" and "andnot" expressions were possible, there was no method to construct complex boolean operations with parentheses to clarify the order of operations. the previous capability allowed the user to base the custom query on data source, geography system, or medical grouping system, however, since the selections were not part of the query statement they could not be modified without returning to the pre-selection screens and re-starting the query process. additionally, the original capability did not allow for querying of data beyond the fundamental chief complaints-level. the following screen shot shows the query options that were available with the original feature. a sample chief complaints query designed to capture influenza-like-illness is shown in figure . the aqt not only contains several capabilities that were not previously available, but also provides an intuitive user-friendly interface that allows the user to build simple or highly ^cough^,and,^fever, or,^sorethroat^,and, fever^,andnot, ^asthma^ complex queries more easily. two new features in the aqt are parentheses, which allow the user to clarify the order of operations, and the ability to select variables such as region, zipcode, hospital, syndrome, sub-syndrome, chief complaint, age, and sex, as part of the query statement. this allows for easy query modifications. additionally, the aqt lets the user query data beyond the fundamental chief complaints level into a more sensitive sub-syndrome or syndrome level. it also allows users to develop queries that contain combinations of chief complaints, syndromes, and sub-syndromes into one query. the query can also contain combinations of different geographies such as zipcodes and regions. this capability is not available without aqt. during the query building process the aqt automatically validates the logic of query expression as it is created, and the user has the option to conduct a final validation prior to executing the query. this feature allows the user to quickly identify syntax errors and correct them before adding further complexity or executing the query. the following screen shot ( figure ) shows the query options available within the aqt feature. a sample chief complaints query designed to capture influenza-like-illness in region_a is shown. world health organization. the world health report -a safer future: global public health security in the st century electronic communication and the future of international public health surveillance planning a public health surveillance system a statistical algorithm for the early detection of outbreaks of infectious disease epidemiology and cause of severe acute respiratory syndrome (sars) in guangdong, people's republic of china challenges faced by hospital healthcare workers in using a syndrome-based surveillance system during the outbreak of severe acute respiratory syndrome in taiwan icd cm expert for hospitals, th ed. salt lake city improving safety with information technology can electronic medical record systems transform health care? potential health benefits, savings, and costs physicians' use of electronic medical records: barriers and solutions essence ii and the framework for evaluating syndromic surveillance systems standardizing clinical condition classifiers for biosurveillance css: the missing manual javascript: the definitive guide advanced querying features for disease surveillance systems the author would like to express his appreciation to colleen martin and jerome tokars of the u.s. centers for disease control and prevention, to sanjeev thomas of science applications international corporation, and to wayne loschen, joseph lombardo, jacqueline coberly, rekha holtry, and steven babin of the johns hopkins university applied physics laboratory. we believe that the aqt will provide an interface that can assist public health investigators in generating complex and detailed case definitions. the interface supports saving queries for future use and sharing queries with others in the user community. the interface is intuitive and accommodates both novice and experienced users. finally, the aqt is a selfcontained tool that can be plugged into most web-based disease surveillance systems with relative ease. the author declares that he has no competing interests. what was already known on the topic • early detection of known and emerging illnesses is becoming vital with the increased rate at which diseases spread world-wide. • most automated disease surveillance systems analyze data by syndrome and look for disease outbreaks within a community, hence overlooking the diseases that fall outside of the broad syndrome categories. what this study added to our knowledge • electronic disease surveillance systems need a more sophisticated querying tool to assist public health investigators in conducting inquires across multiple data sources. • superior analytic flexibility through the use of data elements contained within electronic medical records enables the public health community to rapidly identify specific high risk patients. • the advanced querying tool (aqt) was designed as a flexible and simple graphical user interface (gui) that allows users to develop, investigate, and share complex case definitions. key: cord- -mc xa om authors: lam, simon c.; lui, andrew k.f.; lee, linda y.k.; lee, joseph k.l.; wong, k.f.; lee, cathy n.y. title: evaluation of the user seal check on gross leakage detection of different designs of n filtering facepiece respirators date: - - journal: am j infect control doi: . /j.ajic. . . sha: doc_id: cord_uid: mc xa om background: the use of n respirators prevents spread of respiratory infectious agents, but leakage hampers its protection. manufacturers recommend a user seal check to identify on-site gross leakage. however, no empirical evidence is provided. therefore, this study aims to examine validity of a user seal check on gross leakage detection in commonly used types of n respirators. methods: a convenience sample of nursing students was recruited. on the wearing of different designs of n respirators, namely m- s, m- , and kimberly-clark , the standardized user seal check procedure was carried out to identify gross leakage. repeated testing of leakage was followed by the use of a quantitative fit testing (qnft) device in performing normal breathing and deep breathing exercises. sensitivity, specificity, predictive values, and likelihood ratios were calculated accordingly. results: as indicated by qnft, prevalence of actual gross leakage was . %- . % with the m respirators and . %- . % with the kimberly-clark respirator. sensitivity and specificity of the user seal check for identifying actual gross leakage were approximately . % and . % for m- s, . % and . % for m- , and . % and . % for kimberly-clark , respectively. likelihood ratios were close to (range, . - . ) for all types of respirators. conclusions: the results did not support user seal checks in detecting any actual gross leakage in the donning of n respirators. however, such a check might alert health care workers that donning a tight-fitting respirator should be performed carefully. the unremitting worldwide outbreaks of different infectious respiratory diseases, such as severe acute respiratory syndrome, multidrug-resistant tuberculosis, avian influenza a (h n , h n , h n , and h n ), and human swine influenza (h n ), - have caused increased awareness of occupational protection among health care workers. therefore, use of n filtering facepiece respirators (also known as n respirators) to prevent spread of droplets transmitted and potential airborne infectious diseases is recommended internationally through announcements by the world health organization (who) and u.s. centers for disease control and prevention (cdc). , regardless of the shapes or brands of such respirators, they are generally a tight-fitting half facepiece type, and their reliability is simply dependent on fit to the wearer. according to a laboratory performance evaluation conducted by the cdc, the average penetration by ambient aerosol was found to be % in ill-fitting respirators compared with % in well-fitting respirators. it is believed that the gap existing between the respirator and the wearer's face contributes to such penetration, which is often regarded as leakage. to achieve creditable occupational protection, most well-known authorities, such as the national institute for occupational safety and health, cdc, and who, made fit testing compulsory for wearers prior to use of an n respirator. , in hong kong, fit testing should be a mandatory measure for frontline health care staff working in public and private hospitals. quantitative fit testing (qnft) is a recognized method to determine whether a tight-fitting respirator fits a wearer. this method adopts an electronic device to measure the ratio of particular air particles inside and outside the breathing zone (when donned with a respirator), and the ratio reflects the degree of leakage. to make it simple, qnft is "an assessment of the adequacy of respirator fit by numerically measuring the amount of leakage into the respirator." to assess any possible leakage, most of the preset fit testing systems require the wearer with a donned n respirator to perform a series of exercises, including a static portion without body movement (ie, normal and deep breathing) and a dynamic portion with both normal breathing and designated movements (ie, side-to-side head movement, up and down head movement, talking or reading a standard set of passages, grimacing, bending over). these exercises simulate the common working activities in the clinical environment; hence, the results of qnft can conservatively reflect any possible leakage. the characteristics of objective measurement and an automatic process increase the significance of qnft, which now serves as the gold standard in worldwide guidelines and research literature. [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] although qnft warrants reliability of n respirator usage, any significant change in facial morphology, body weight, or donning method may contribute to on-site leakage. , , therefore, even if a given respirator is considered fit by the recognized fit testing, a user seal check is still suggested in order to check the appropriateness of every donning. , [ ] [ ] [ ] a user seal check is a self-examination procedure for wearers of n respirators to identify on-site gross leakage through repeated visual checks on obvious gaps and positive and negative pressure checks on the seal. , , [ ] [ ] [ ] n respirator manufacturers and some authorities recommend that this practice should be routinely carried out by frontline health care workers. , , previous experimental studies on u.s. subjects suggested that the user seal check improved the donning of n respirators. , although the scale of these studies was not large enough (n = and n = ), , the rigor of the experimental design and the use of repeated measurements increased the credibility of the results. some guidelines suggest that no further fit testing is needed for a given respirator if subjective leakage is detected by a user seal check. this check may substitute for fit testing if fit testing is not available because of logistic difficulties or failure of the fit testing system. several recent studies, nevertheless, have consistently rejected this suggestion of substitution. , , , in hong kong, a retrospective study demonstrated that the user seal check failed in determining the fit of n respirators because its false-positive ( %- %) and false-negative ( %- %) rates were too high among chinese nursing staff. lam et al further supported the previously mentioned claim through prospective studies on chinese nursing students (n = and n = , respectively) by presenting the sensitivity ( %- %), specificity ( %- %), positive ( %- %) and negative ( %- %) predictive values, and kappa values (- . to - . , p > . ) of the user seal check. , in a canadian study, similar results and conclusions were also reported on research involving health care workers (false positive rate: %- %). the congruent results indicate that the user seal check cannot replace the fit testing. it is believed that the user seal check, which does not involve any dynamic body movement, is unlikely to mirror the fit testing results because the latter assessment is performed when the wearer performs sequential exercises involving a series of head and body movement. however, the user seal check may still be able to identify onsite gross leakage and give some information on the gross leakage on normal breathing or deep breathing without head and body movement. given its immense implication on occupational protection, its validity has not yet been rigorously studied. therefore, the research question was as follows: can the result of the user seal check reflect the actual gross leakage under the conditions of normal and deep breathing? this study, hence, aimed to examine the sensitivity, specificity, predictive values, and likelihood ratios of the user seal check on actual gross leakage detection during normal breath-ing or deep breathing without head and body movement in common respirator models of different designs. this study used a descriptive, prospective, and cross-sectional research design. from september -december , a convenience sample of chinese students who studied in different nursing programs (ie, year of bachelor's or higher diploma program) in a local university was invited to participate by internal e-mails and several announcements. data collection consisted of phases (ie, registration, training session, fit testing session) (fig ) . the demographic data of participants (sex, body height, and weight), the results of the user seal check, and the results of actual gross leakage detection through the fit testing device were recorded in a data sheet. during the registration, all participants were required to sign the consent form and prepare themselves in the same manner and appearance of clinical practicum (eg, pinning up long hair, shaving). in the training session, a -minute training, including video, demonstration, and practice on standardized n respirator donning technique and the user seal check method, was introduced by trained registered nurses. through redemonstration, the donning techniques and user seal check method of each participant were assessed by these nurses prior to moving on to the next session. apart from the time used in registration and the training session (various times among the participants), it took another minutes to complete the remaining process of data collection, namely the user seal check and qnft on the exercises of normal and deep breathing for the given types of n respirators, where the sequence of testing remained unchanged for all participants. to control the environmental factors, such as the concentration of suspended particles and dusts, which may affect the result of fit testing, all of the data were collected in an assigned air-conditioned room with an area of m , temperature at approximately °c, and humidity at approximately %. to perform a user seal check, the wearer subjectively assessed and adjusted the position and tightness of a given n respirator through a visual check and positive and negative pressure checks. details on the steps and methods for the user seal check can be found in previous studies. , , a positive result is indicative of subjective gross leakage. , the portacount pro+ respirator fit tester (tsi, st paul, mn) was adopted to measure the actual gross leakage. the details, including technologic information and protocol setting of this system, were introduced elsewhere. , , currently, this system is widely adopted in public and private hospitals in hong kong and is used as a local quality control standard by respirator manufacturers. figure shows the fit tester system, tubing connection, and respirator. all of the participants were only required to perform the static portion out of the specified exercises (ie, normal breathing, deep breathing). in this portion, the participants should remain still in a normal standing position and breathe as usual for seconds before taking long deep breaths as if working hard for another seconds. the research nurses monitored and assessed chest movement by visual inspection to estimate adequacy of the depth (fig ) . these exercises gave particular individual fit factors (ffs; range, - ). each ff is the ratio of a challenge agent (ambient particles) concentration outside the respirator to the concentration of a challenge agent that leaks into the inside of the respirator. a ff < under the normal breathing and deep breathing exercises is defined as actual gross leakage. , the higher the ff, the lesser amount of leakage. the portacount pro+ respirator fit tester went through a daily check procedure to warrant the sufficiency of ambient particles and performance of the system. the cup-shaped m- s ( m, minneapolis, mn) ( m-a), -panel designed m- ( m) ( m-b), and pouch-type kimberly-clark (kimberly-clark, neenah, wi) (kc-c) n respirators were selected. the selection was based on reasons. first, these models are typically and widely used in local clinical settings. second, previous studies demonstrated that the prevalence of the fittesting failure rate was approximately % for the m models. , , [ ] [ ] [ ] it is estimated that the prevalence of actual gross leakage would be lower than that. according to our previous experience on qnft, the obtained ffs of normal and deep breathing were generally higher than that of the other exercises. extreme prevalence rates, such as < % or > %, greatly deteriorated the accuracy of both positive and negative predictive values. , the prevalence rate of actual gross leakage among the different designs of respirators should be within the optimal range for calculation of the predictive values. finally, it is unrealistic and unnecessary to include all types of n respirators for fit testing. in general, most of them were designed under these categories. this study used a representative respirator from each category; hence, the results could provide a better evaluation on the validity of the user seal check. ethical approval was sought from the president's advisory committee on research and development, the open university of hong kong. an invitation letter was prepared. information about the purposes of the study, right to confidentiality, right to withdrawal, and duration of fit testing and a consent statement were provided. participants' written consent was obtained prior to data collection. descriptive statistics were used to present the participants' demographic variables and the results of the user seal check (ie, positive, negative) and actual gross leakage (ie, pass, fail). independent sample t tests were undertaken to test for the difference between participants in the groups (positive and negative user seal checks) with regard to their results of ff. the significance level was set at p < . . the results of the user seal check compared with the gold standard qnft on actual gross leakage through cross tabulation were used to compute the following diagnostic parameters: sensitivity, specificity, positive and negative predictive values, and likelihood ratios (refer to the "note" in table for the respective formula). the sensitivity (ability of the user seal check to correctly identify a case with gross leakage) and specificity (ability of the user seal check to correctly identify a case without gross leakage) were calculated from the measurements. according to the evaluation of the performance characteristics of diagnostic tests in the medical literature, a combination of high sensitivity and specificity (> %) [ ] [ ] [ ] is equally important and is an indication of the characteristics of the user seal check itself (ie, test's ability). because the user seal check is applied in clinical practice, additional performance evaluations, positive and negative predictive values, are necessary to help interpret the results. a value ≥ % is considered to be satisfactory for both predictive values. another method for describing the screening accuracy of the user seal check is the likelihood ratios. the ratios have an advantage over the aforementioned sensitivity, specificity, and predictive values because they are independent of the prevalence of actual gross leakage and hence can be applied across settings and populations. according to the recommendation of using probabilistic reasoning, , the user seal check is moderately good at ruling in leakage if the positive likelihood ratio is > . conversely, such a check is moderately good at ruling out leakage when the negative likelihood ratio is < . . values close to . represent that the user seal check is useless in predicting the presence or absence of actual gross leakage. a total of nursing students participated in the study. for those who did not participate or were excluded, the reasons included who were physically unfit (eg, asthmatic attack, influenza), who were absent (eg, withdrew from the program), and who had unshaven bushy facial hair. the participants ranged from - years of age, and . % of them were men (n = ). their mean height was . ± . cm, and their weight was . ± . kg. as far as the ffs between a group of positive and negative user seal checks were concerned, generally the participants with negative user seal checks obtained an observable higher score in the types of respirators compared with those with a positive check. however, only significant differences were found regarding the use of the kc-c respirator (t = . - . , p = . -. ) ( table ) . the results of the user seal check compared with that of actual gross leakage performed by qnft are presented in tables and . among the participants, . % (n = ), . % (n = ), and . % (n = ) reported positive user seal checks regarding the m-a, m-b, and kc-c respirators, respectively. however, the prevalence of actual gross leakage identified by qnft in normal breathing was . %- . % in both of the m res- pirator models and . % in the kc-c model. in deep breathing, the prevalence was similar, . %- . % in both of the m respirator models and . % in the kc-c model. testing on the different respirators in the breathing conditions, the sensitivity and specificity of the user seal check for identifying a case with actual gross leakage ranged from . %- . % and . %- . %, respectively (table ). extreme prevalence rates caused deviation of positive predictive values and negative predictive values. according to the current results on prevalence rates of actual gross leakage (ie, between . % and . %), further evaluation on the characteristics of the test's performance of positive and negative predictive values was regarded as appropriate. regarding the test of the m respirators, the positive predictive values of a positive user seal check for estimating the probability of actual gross leakage ranged from . %- . %, whereas the negative predictive values ranged from . %- . %. in contrast, the test of the kc-c respirator showed different patterns, which were of relatively high positive predictive values ( . %- . %) and low negative predictive values ( . %- . %). finally, both the positive and negative likelihood ratios indicating the post-test probability of the user seal check were close to . (positive likelihood ratio range, . - . ; negative likelihood ratio range, . - . ). table presents the detailed results. concerning the m respirators, the observed differences of the ffs between a group of participants with positive and negative user seal checks were minimal and these differences were not statistically significant at all ( . - . vs . - . , respectively). although a significant difference was found for use of the kc-c respirator, the mean score of the ff of a group of negative user seal checks (no subjective gross leakage) was still < (a detection of actual gross leakage), which implies that kc-c respirator is difficult to fit chinese participants. concerning donning with m respirators, the prevalence of actual gross leakage in this study ( %- %) was slightly lower than that of the failure rate of fit testing in previous studies ( %- %). , , this was not surprising because the fit testing examines the degree of leakage during a series of exercises, whereas the actual gross leakage is computed only based on the measured ff on the static portion. however, the actual gross leakage that was found in the kc-c respirators was still frequent (up to %). it may imply that a higher failure rate on fit testing of this model was expected among the chinese population. this warrants future empirical testing. the positive user seal checks ranged from %- % in the current study, which is comparable with that of previous studies ( %- %). , , in some occasions, participants who felt the gross leakage of a given respirator (assessed by the user seal check) passed the fit testing in normal and deep breathing (ie, false-positive rate: . %- . %). in contrast, more frequently, participants subjectively expressed the good fit of a given respirator, but the actual gross leakage was still detected by qnft in normal or deep breathing mode (ie, falsenegative rate: . %- . %). similar observations were consistently reported in the literature, , which reinforced that the leakage between the face and respirator is unlikely identified by human sense. the literature indicated the sensitivity and specificity of the user seal check in determining the fit of n respirators were %- % and %- %, respectively. , such results suggested that the user seal check cannot replace the fit testing because the fit testing simulated a series of head and body movement on leakage detection. the current study hypothesizes that the user seal check may contribute to the detection of gross leakage in normal and deep breathing, which is important information during on-site donning. , however, based on the unacceptable sensitivity ( . %- . %) and specificity ( . %- . %) in the current results, the hypothesis that the user seal check is able to detect actual gross leakage in normal and deep breathing is also rejected. interestingly, the sensitivity and specificity of the user seal check in determining the fit of n respirators and in detecting gross leakage are fairly comparable. such a phenomenon may imply that leakage in normal and deep breathing shall predict the result of fit testing. however, further empirical testing is warranted to work out this possibility. to illustrate the clinical implication of the current results of predictive values and likelihood ratios, by using an example of donning the m-a respirator, an interpretative summary of the validity and test performance of the user seal check for identifying actual gross leakage is presented as follows. the prevalence of the actual gross leakage was approximately % ( . - . %, as indicated in table ) when donning the given respirator, which was interpreted as pretest probability. , before conducting any kind of testing, a randomly selected nurse wearing the m-a respirator would have a % chance of having actual gross leakage. predictive values vary according to the prevalence of the actual gross leakage. high prevalence tends to have higher positive predictive value, whereas low prevalence tends to have higher negative predictive value. [ ] [ ] [ ] the current prevalence of actual gross leakage was approximately % as mentioned, which was satisfactory in further calculating post-test probability. this nurse then performs a routine user seal check to ensure the absence of subjective gross leakage. likelihood ratios help to calculate post-test probability of actual gross leakage. the current results indicated that positive and negative likelihood ratios were . and . , respectively. therefore, with these ratios, the chance of the nurse with a positive user seal check having actual gross leakage is . % ( % × . ), whereas a negative user seal check reduces the chance of the nurse having such leakage from % to . % ( % × . ). figure illustrates such probabilities through the nomogram. based on this example, the practice of the user seal check provides limited information in predicting the actual gross leakage when donning the given respirator. several limitations deserve discussing. one is that only brands of respirators (ie, m and kimberly-clark) were used for gross leakage detection through qnft. although our aim was not to investigate the prevalence of gross leakage of all different models of n respirators, it was possible that different results might be obtained with different respirators. nevertheless, we believe that the results supported the unacceptably low sensitivity and positive predictive value and futile likelihood ratios of the user seal check in identifying gross leakage of respirators. apart from this, participants' characteristics might affect the passing rate of fit testing. first, most participants were novice users, except that some worked in clinical settings as health care workers. previous experience and knowledge of donning an n respirator were insufficient, which may influence the passing rate of qnft on gross leakage detection. this is different from a previous study, where viscusi et al recruited subjects who were required to pass a standard qnft. therefore, the current results may underestimate the passing rate of qnft. second, asian participants' weight (reported here) and facial anthropometries (eg, face length, face width; not reported here) were significantly different from that of non-asian people, which hence affects the passing rate of qnft. such differences might reduce the generalizability of the results but increase the specificity of that to asian populations. concerning environmental factors, the average monthly humidity in hong kong (subtropical climate) ranged from %- % in ; yearly humidity computed from - was . %. most hospitals are only equipped with central air conditioning systems, and indoor humidity of wards may vary from %- %. relatively high humidity might underestimate the positive result of the user seal check in the current study because participants rely on subjective comparison between inward and ambient air to detect the leakage. unlike well-controlled internal hospital settings in other regions, these environmental differences may limit the current results in that they are less relevant to other settings but are highly situation-specific results for many hospitals located in subtropical climate regions. further studies are recommended to replicate the works from myers et al and viscusi et al, which examined the effectiveness of the user seal check on improving n respirator donning among asian wearers. another study may investigate how the change of body weight and facial anthropometries of asian health care workers contributes to leakage of n respirators. it is difficult to cite any evidence on the value of the user seal check on determining the fit of n respirators or even detecting any actual gross leakage during normal and deep breathing. however, the practice of the user seal check might contribute to enhancing the donning procedure of a respirator. although the leakage is difficult to identify by subjective human sense, this check draws our attention to the issue that the tight-fitting respirator should be worn carefully. effectiveness of precautions against droplets and contact in prevention of nosocomial transmission of severe acute respiratory syndrome (sars) h n outbreaks and enzootic influence influenza pandemics of the th century face mask use and control of respiratory virus transmission in households multidrugresistant tuberculosis outbreak among us-bound hmong refugees worldwide emergence of extensively drug-resistant tuberculosis epidemic and pandemic alert and response (epr): infection prevention and control of epidemic-and pandemic-prone acute respiratory disease in health care: who interim guidelines guidelines for preventing health-care-associated pneumonia, : recommendations of cdc and the healthcare infection control practices advisory committee laboratory performance evaluation of n filtering facepiece respirators regulations (standards- cfr), occupational safety and health standards the fitness of n respirators among undergraduate chinese nursing students in hong kong predictive value of the user seal check in determining half-face respirator fit protecting healthcare staff from severe acute respirator syndrome: filtration capacity of multiple surgical masks respiratory protection by respirators: the predictive value of user-seal-check for the fit determination in healthcare settings sensitivity and specificity of the user-sealcheck in determining the fit of n respirators testing of the sensitivity and specificity of the user-seal-check procedure on "gross leakage" of n respirators racial differences in respirator fit testing: a pilot study of whether american fit panels are representative of chinese faces portacount pro respirator fit testers: operation and service manual respirator donning in post-hurricane new orleans m occupational health and environmental safety division qualitative fit testing instructions (kcpi- ) use of personal protective equipment for respiratory protection effectiveness of fit check methods on half mask respirators evaluation of the benefit of the user seal check on n filtering facepiece respirator fit health care workers and respiratory protection: is the user seal check a surrogate for respirator fit-testing? understanding diagnostic tests : sensitivity, specificity and predictive values how to report statistics in medicine the sensitivity, specificity, and predictive value of traditional clinical evaluation of peripheral arterial disease: results from noninvasive testing in a defined population systematic reviews of evaluation of diagnostic and screening tests evaluation of sensitivity and specificity of rapid influenza diagnostic tests for novel swine-origin influenza a (h n ) virus foundations of clinical research: applications to practice diagnosis in general practice: using probabilistic reasoning evidence-based medicine: how to practice and teach ebm the year's weather- we thank all participants for their contribution to this study; jojo y.y. kwok, billy o.y. pang and rebecca c.m. tsang for supervision on data collection; ka-yan chan, suen-fuk fan, nga-yi lau, sinting tai, tsz-kwan yuen, wai-hung chung, and wing-on lui for assistance on data collection and data input; andy c.y. chong for provision of statistical support; and our division for support on specific equipment, consumables, and room usage. key: cord- -jyfmnnf authors: holzapfel, kilian; karl, martina; lotz, linus; carle, georg; djeffal, christian; fruck, christian; haack, christian; heckmann, dirk; kindt, philipp h.; koppl, michael; krause, patrick; shtembari, lolian; marx, lorenz; meighen-berger, stephan; neumair, birgit; neumair, matthias; pollmann, julia; pollmann, tina; resconi, elisa; schonert, stefan; turcati, andrea; wiesinger, christoph; zattera, giovanni; allan, christopher; barco, esteban; bitterschulte, kai; buchwald, jorn; fischer, clara; gampe, judith; hacker, martin; islami, jasin; pomplun, anatol; preisner, sebastian; quast, nele; romberg, christian; steinlehner, christoph; ziehm, tjark title: digital contact tracing service: an improved decentralized design for privacy and effectiveness date: - - journal: nan doi: nan sha: doc_id: cord_uid: jyfmnnf we propose a decentralized digital contact tracing service that preserves the users' privacy by design while complying to the highest security standards. our approach is based on bluetooth and measures actual encounters of people, the contact time period, and estimates the proximity of the contact. we trace the users' contacts and the possible spread of infectious diseases while preventing location tracking of users, protecting their data and identity. we verify and improve the impact of tracking based on epidemiological models. we compare a centralized and decentralized approach on a legal perspective and find a decentralized approach preferable considering proportionality and data minimization. we propose a decentralized digital contact tracing service that preserves the users' privacy by design while complying to the highest security standards. our approach is based on bluetooth and measures actual encounters of people, the contact time period, and estimates the proximity of the contact. we trace the users' contacts and the possible spread of infectious diseases while preventing location tracking of users, protecting their data and identity. we verify and improve the impact of tracking based on epidemiological models. we compare a centralized and decentralized approach on a legal perspective and find a decentralized approach preferable considering proportionality and data minimization. the current covid- pandemic poses challenges to our society on a scale unheard of in recent times. while the direct consequences of the pandemic are felt in healthcare and medical sectors, the quarantining and isolation measures required to slow the outbreak have a major impact on the psychological and economic welfare of people. one measure that has been applied in a digital and analogue manner is the tracing of contacts. state authorities resorted to this measure in order to find out about new infectious persons and prevent further spreading of the disease by quarantining them. many of the analogue solutions are problematic because state authorities might not have the capacity to question and pursue contacts and when they do, they touch upon the privacy of citizens. a first wave of governmental apps have been criticized on the basis that they could be used for surveilling citizens. as a reaction, more privacy friendly approaches have been proposed. the experiences with existing apps have shown that potential loopholes are exploited even by citizens in order to identify infected persons and e.g. shame them on social media . hence, the issue arose whether there is an effective digital solution that also takes into account ethical, legal, and societal concerns. therefore, our concept puts forward ideas to improve the decentralised concept. our current proposal aims at fighting infectious diseases in an effective manner while safeguarding and realizing the citizens' rights, freedoms and legitimate interests. we focus on privacy and it-security concerns. in the spirit of a privacy by design solution, we incorporated legal principles and requirements into the very design of our solution. while our first point of reference was the european union's general data protection regulation (gdpr) / , there exist similar and equivalent principles and requirements in many legal orders, including the council of europe's convention . these include: • lawfulness, fairness and transparency (art. sec. subsec a. gdpr) • purpose limitation: data shall be "collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes" (art. sec. subsec b. gdpr) • data minimization: data shall be "adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed" (art. sec. subsec c. gdpr) • accuracy: data shall be "accurate and, where necessary, kept up to date; every reasonable step must be taken to ensure that personal data that are inaccurate, having regard to the purposes for which they are processed, are erased or rectified without delay" (art. sec. subsec d. gdpr) • storage limitation: data shall be "kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed" (art. sec. subsec e. gdpr) • integrity and confidentiality: "processed in a manner that ensures appropriate security of the personal data, including protection against unauthorised or unlawful processing and against accidental loss, destruction or damage, using appropriate technical or organisational measures" (art. sec. subsec f. gdpr) while the concept laid out here addresses many principles and requirements, its actual implementation will contain further technical and organizational measures in order to mitigate risks and further issues. this is particularly true for data subjects rights like the right to rectification and the right to erasure. a further development of the application will also have to look into other issues like inclusion, fairness, transparency and effective governance of the application. we present a secure solution for a digital contact tracing service (dcts) that protects the users' privacy, identity and personal data from attackers. encounters, their proximity, and duration are required in order to properly track contacts of people and infection chains. we propose the use of bluetooth, a short range wireless communication protocol, as a means to measure these quantities. bluetooth detects only real encounters and works indoors as well as outdoors (e.g. underground in subways or in buildings), where location (e.g. by gps) and mobile network data is not reliable anymore. bluetooth is a technology standard available on every mobile phone and thus provides the ideal global instrument to register encounters on local devices. we present the general concept of dcts in section , the technical details and implementation aspects in section , and consider possible attack scenarios in section . we cover related work in section , and provide a legal perspective in section and conclude in section . in this section, we outline the general concept of dcts. details for the technical implementation are covered in section . we propose a mobile bluetooth application (app) to introduce dcts in the society. our proposed app permits the registration of relative encounters while preserving the privacy of its users by design. the concept is based on the following principle: each mobile phone equipped with the dcts app advertises temporary contact numbers (tcns) to other phones. at the same time, it records and stores the tcns advertised by other phones. phones continuously advertise their random tcns and store the observed random tcns of neighboring devices, while users can simply follow their daily routine activities (such as office, school, theater, etc.). in case users are infected, they can agree to an upload of their advertised tcns to a server after approval from medical authorities. every app user continuously checks the server and gets information on the tcns related to people tested positive for the virus. a matching operation done on the user's device reveals them to the user only if a potentially infectious contact has happened. in this way, each user is informed about potential infectious contact without revealing so to another party. the identity of the infectious person and their social graph remains protected. the user installs the dcts app. the app activates bluetooth and generates a key, which it uses to generate a random tcn. the phone then proceeds to advertise the random tcn via bluetooth, such that other devices in the vicinity of the user can see the tcn. this tcn is updated after a certain time in order to minimize re-identification of the user. the app stores the advertised tcns for a period of two or three weeks, depending on what is sensible for the infectious period of the virus. the key for tcn generation is updated every day and is stored on the phone. this ensures that key compromising can not deanonymize users' past movements. in parallel to the advertising operation, the bluetooth activated on the device continuously scans for other devices in its vicinity. when neighboring devices are detected, the app stores the observed tcns, the time, and signal strength on the phone. the period of exposure can be calculated using the saved timestamps and the proximity can be evaluated based on the received signal strength. we only store encountered tcns for a time period of two or three weeks. we show a sketch of the tcn exchange between different phones in fig. . if a user is tested positive for the virus, the patient is encouraged by medical authorities to provide the tcns advertised over a period of two or three weeks. the patient is informed about how their identity is protected, including the description of the risk of identification via possible attacks (see section . ). if the patient agrees, their advertised tcns are uploaded to a server where they are verified and encrypted before being made available. the patient gets the permission by a medical authority to upload the generated tcns, and keys to a server. this permission can be granted in various ways (see section ). the patient then proceeds to upload the keys. the server regenerates and verifies the patient's tcns with the provided keys. this verification prevents impersonation of other users. the server deletes the keys after the verification. this scenario is shown in fig. . the patient provides the keys used for tcn generation to the medical personnel, e.g. by showing a qr code with the keys to an authorized person. the medical personnel then verifies the tcns by regenerating them and uploads the tcns to the server following an authentication procedure (e.g. username and password of medical authority). the medical personnel deletes the keys after the upload. scenario ensures the user's anonymity towards the server and does not require to use mobile data or wifi. the app generates a new key for future tcn generation after the upload. • generates its own tcn and stores the tcn with time • updates its own tcn regularly if tested positive: • george agrees to upload his random tcns to the cloud • physician gives permission to upload the random tcns tcns … … george figure scenario : when a user receives a positive test result, they get the permission to upload their random tcns from the past two or three weeks to the server. the server collects newly uploaded tcns for a predefined period of time, e.g. one hour, and shuffles their order to avoid the association of several tcns to a single user. then, the server stores the shuffled batch of tcns to its main database. this enables users' apps to check whether they were in contact with the patient. tcns are only stored for two/three weeks and are then automatically deleted. users in their daily life have the app working on their devices. in parallel to the continued advertisement of tcns, the app checks regularly, e.g., once per hour, for new tcns on the server. if new tcns are present, the app retrieves information about them from the server. the patients' tcns are then matched against the encountered tcns registered on the device of the user during a period of two/three weeks. this matching is done on the user's phone. if the app detects a match, the user receives a notification that a potentially infectious encounter has been detected (fig. ) . this notification includes recommended actions, such as self-quarantining and calling a number or visiting a website with contact details for medical authorities. the notified user can also be asked to proceed and provide his/her tcns to allow a recursive tracing. we recommend consulting psychologists about the exact wording and information of this notification in order to achieve the desired effect. we present two approaches on how to detect encounters in the following. in order to check whether the user has been in contact with an infected person, they download all unchecked tcns stored on the server and check for matches within their own list of observed tcns. when encountering matches, the app can perform a risk assessment based on exposure time period and proximity. the risk assessment can be included in the notification. this approach is vulnerable to an attack described in section . . a possible solution for this vulnerability is described in the following section. the previous scenario is vulnerable to the attack scenario described in section . . an attacker can put a device with the app together with a video camera at a public place, record the broadcasted tcns and can later check for infected tcns on the server. if the attacker has recorded an infected person, they can potentially assign infected tcns to people on the camera. in order to avoid this attack, we can check for the number of tcns that are both in our set of encountered tcns and in the set of infected tcns. an algorithm determining the private set intersection cardinality with low communication cost (see for example ) is a valuable strategy in order to discover the exposure to infectious contacts without risking the identification of the patient (details are described in section . ). if intersection cardinality is used, the risk assessment can still be done by e.g. introducing three categories: high exposure, medium exposure, and low exposure. the encountered tcns are sorted into these categories, depending on the contact period and the proximity. then the number of overlapping tcns in each category can be checked, which provides a measure for user exposure. several studies have confirmed that covtid- is infectious before people develop symptoms , . this leads to a spread of the virus before people get diagnosed and can isolate. for example, person a gets infected. before developing symptoms, person a infects person b. person a still shows no symptoms and b continues living normally and infects person c. now a develops symptoms. after a got the diagnosis, person b is notified, but person c is still oblivious. thus, it is possible for the virus to spread much faster than direct contacts can be traced. in , it is shown that infection chains can only be stopped if indirect contacts are traced as well. we enable person b to upload their tcns to the server as soon as b gets the notification of having been exposed. if b uploads their tcns, person c gets a notification before infecting anybody else. for more details see section . . we explain the details of implementation and protocols in this section. a more general overview of the concept is provided in section . our proposed approach uses bluetooth low energy (ble) for detecting devices in range. the procedure using which two wireless devices establish a first contact in a wireless network is called neighbor discovery. in ble, devices periodically broadcast packets with an interval for neighbor discovery. for reducing the probability that multiple consecutive packets of different devices are sent at the same point in time and hence collide, a random delay between and ms is added to each instance of . in addition, devices listen to the channel for a time window of length every time-units. a device has successfully discovered another one, once a beacon from the opposite device coincides with one of its reception windows. we have estimated the performance when two smartphones discover each other (cf. for details). the choices of , and supported by the android operating system are not officially documented. we have therefore looked them up in the source code of the android operating system. due to scheduling conflicts, the values actually used could differ during runtime. we nevertheless found that for certain configurations, the latency measured from the point in time at which two devices come into range until discovery is successful is below s during normal operation, i.e., when no scheduling conflicts occur. such latencies are practical for contact tracing. we also found that continuous contact tracing has no significant impact on the smartphone battery runtime. we expect that the battery is drained by no more than % by contact tracing, while the energy demand is even significantly below that in most cases. finally, we investigated the behavior in crowded situations, where a large number of devices are in range of reception. here, the packets from multiple devices could potentially collide. we found that even in situations with devices being close to each other, the probability that all devices discover each other successfully within s is close to %. in our approach, we chose the most beneficial parameters for ble based on this evaluation. we thereby ensure that contact tracing is carried out with the highest possible reliability and the lowest possible energy consumption. distance estimation is done by evaluating the received signal strength indicator (rssi) provided for each received packet. this estimation is known to be error-prone. in our approach, we eliminate as many sources of error as possible, while classifying a contact as significant by jointly considering the rssi and contact duration. this reduces the rate of false positives and negatives. our approach is similar to the contact tracing suggested by apple and google in this aspect. we suggest modifying their approach by using a completely random daily key ( ) everyday to ensure forward secrecy. in the original approach, a leak of the private key allows an attacker to reproduce all past and future daily tracing keys. from the pseudo random tcn ( ) generation looks as follows: ← truncate(hmac( , utf ("ct-rpi")|| )), ) , where, is the time interval number, the n-th minute of the day (e.g. : would be in the second time interval, thus = ). this time interval number prevents rebroadcasting the tcns of other users in other time intervals. bluetooth also advertises the mac-address of the device. according to our observations, this address changes after a certain time and is also changed when bluetooth is activated. we were able to observe this behaviour in our tests on android and as well as ios . we stop and restart advertising immediately when updating the tcn, such that the advertised macaddresses change at the same time as the tcns. this prevents any malicious association of a mac-address to several tcns. these changing tcns keep the user anonymous and complicate tracking. the dcts app stores the keys and the generated tcns in a database (e.g., sqlite encrypted with sqlcipher ) that is physically present on the device. the maximal length of advertising data for bluetooth is byte (for bluetooth .x). the advertisement includes a service universally unique identifier (uuid) of bit. this uuid identifies the advertisement as a dcts advertisement to other devices. the pseudo random tcns are then advertised as additional data with a size of bytes. another method for random tcn generation is presented in . their described technique is similar and we are currently evaluating which approach is the most secure and privacy preserving and offers the most protection to the user. the app registers only dcts-advertisements of devices in the vicinity, using a filter for the service uuid in the bluetooth scans. simultaneously, the app advertises the user's tcns with a size of byte. for android, advertised tcns can be read out by the scan-callback and saved into the sqlite database. when a pseudo random tcn is seen, the app calculates a contacteventtcn ( ): where is the time interval number of the contact time. additionally, the app saves the contact time and the received signal strength indicator (rssi) of the advertisement. the proximity can be estimated using the rssi. the contact time period can be calculated if a tcn is registered several times. both time period and proximity can then be combined to a degree of exposure to the virus. we note here that rssi is a relative quantity and can differ for different chips. a possibility to calibrate rssi could be to evaluate the range of rssi within the first days of taking data. within these days the user most probably has had close and distant encounters with people. this reveals an estimate of the highest and lowest range of rssi. we could then estimate the proximity with rssi calibrated on its maximal and minimal value. however, rssi also depends on many factors, such as for example the orientation of the devices' antennas, whether the line-of-sight is obstructed, potentially humidity, and the channel on which a packet is sent. we will rule out these errors whenever possible. the remaining error then impacts the required contact time. if e.g. the estimated distance is lower than the actual distance, the devices would need to be longer in their vicinity in order to count as relevant contact. only medical personnel have access or can grant access to the server to perform the upload. this minimises the misuse of reported tcns of patients who are tested positive for the virus. the app offers an interface allowing medical personnel to either upload data to the server or to grant access to the server. in order to ensure correct use of the app, a short instruction or training for the use of the app needs to be provided. also, we need to identify medical personnel in order to provide them with credentials for the server upload. this can be done by either contacting test centers directly or by getting the relevant contacts from the health office, to which the infected individuals are reported to. doctors and institutions authorized to report people as infected can then be contacted and provided with the credentials and the instruction for the use of this app. patients are asked to upload their pseudo random tcns after consultation with the medical personnel. the permission is provided via a token or an access code or a tan at the doctor's office or the test center. a qr code or a tan are provided by the doctor to the patient. alternatively, the access code or tan is provided via letter together with the test result (for recursive tracing). the code or tan is then only valid for a single use for a restricted period of time. inserting the tan or scanning the qr code triggers the upload of the keys used for generating the random tcns for the past two/three weeks. an ip anonymization protocol such as tor can be used to avoid revealing the patient's ip-address to the server during the upload of the tcns. once the keys are on the server, the server then verifies the tcns by regenerating them with the keys for each day and all possible time interval numbers. the server then calculates for each pseudorandomtcn the corresponding contacteventtcn (see eq. ( )) with the respective time interval number and date. after tcn verification, the server deletes the keys. this verification ensures that patients can not simply upload observed tcns of other people to mark them as infected. if the user does not have access to wifi, the upload of the tcns can be done in the test center or doctor's office or over mobile data. the user hands their keys to the doctor providing a qr code. the medical personnel receives the keys via the dcts app and regenerates the user's pseudorandomtcns with the fixed time interval numbers. the pseudorandomtcns are then hashed together with their respective time interval numbers and date (as shown in eq. ( )) to generate the contacteventtcns. the medical authority then proceed to upload the user's tcns to the server using their credentials to access the server. the server collects all verified tcns for a short period of time, for example one hour. the collected tcns get sorted or shuffled and saved into a database (for example as sqlite database encrypted using sqlcipher ). for the private set intersection cardinality protocol, the server computes ( ( )) for all shuffled tcns (see section . ). the server allows the dcts users to download the database (or the bloom filter) of the encrypted tcns. these hourly releases assure that a set of tcns cannot be linked to one person, because the tcns of several people are combined and their order is changed by either shuffling or sorting. sorting the tcns makes sense if the user directly downloads the tcns, such that they can do a binary search for their contacteventtcns. the user can then do hourly queries to get timely notifications in case of a contact. if the user queries the server more sporadically, they get the newly added data since their last query. to prevent attacks that might identify the patients, the dcts app can use private set intersection cardinality to determine the number of infectious contacts. several possible methods to determine the private set intersection cardinality exist, for example or the protocol described in . we have not yet decided which specific protocol to use, but we present the working principle of the latter protocol. the server has a set of infected tcns ( ), and the user has a set of encountered contacteventtcns ( ). both user and server have locally a set of public and secret keys (user: , ; server: , ). the user now shuffles scheme of the private set intersection cardinality algorithm described in . with this protocol users determine how many of their observed tcns have been marked as infected on the server. the matching happens on the users' phones, the server can not infer the number of intersections. their and encrypts them ( ( )). they then send ( ) to the server. the server also shuffles and encrypts ( ), such that the server sends ( ( )) back to the user. we use a commutative encryption scheme, such that ( ( )) = ( ( )). the user now decrypts ( ( )) and gets ( ). a commutative encryption scheme is for example pohlig-hellmann or sra . on the server side, the server encrypts its and applies a bloom filter ( ) on ( ). this step can be precomputed for all uploaded tcns. the server then sends ( ( )) to the user. the user also applies a bloom filter on ( ) for each of their encountered tcns. eventually, the user then checks whether ( ( )) occurs in ( ( )) for each of their encountered tcns. the steps of the protocol are displayed in fig. . this reveals to the app how many observed tcns are in the set of patients' tcns saved on the server. this information is only obtained on the user's device itself, no such information is leaked to the server. the user does not know which contacteventtcn belongs to an infected person and thus we ensure the patient's anonymity. this of course makes it impossible to assign the remaining stored data, such as exposure time period or proximity, to a single direct contact. however, private set intersection cardinality does not prevent the possibility to provide a risk assessment determined on the time period and signal strength of the encounter to the user. in order to provide such a service, the app can check for an encounter first. then, the contacteventtcns are divided into sub categories, for example high exposure, medium exposure, and low exposure. the set intersections can be checked for each category and allow for a risk assessment of the user. it is sufficient to receive ( ( )) from the server and do the set intersection with the previously received ( ( )) for this second query step. if there was no encounter, random tcns are picked from the observed tcns to simulate the second query step to the server, such that the server does not know whether the users have been in actual contact with an infected person or not. for preventing the server from recognizing the same tcns, a new key-pair needs to be used every time the mobile device queries the server. the server requires a minimum number of tcns to be queried in order to prevent the user querying single tcns and potentially identifying infected users. if users only have one received contacteventtcn, the list of encountered tcns is padded with randomly generated tcns, such that they can still query the server. this increases the chance of a false positive result. we introduce a limit on the query rate, such that the app can only send a limited number of requests to the server at a time in order to prevent brute force attacks. then users have to wait a certain period of time until they can query the server anew. the server's public key can additionally change for each combined hourly uploaded patient tcn set. this complicates a brute force attack further, because the attacker needs a different bloom filter of their encountered tcns for each hourly dataset. using this approach to private set intersection cardinality might reveal the number of encounterered tcns to the server. this is not the case if we allow the user to directly download all tcns of infected people and check for matches on their phone. using the latter method, the server gets no information whatsover about non-infected app users (except maybe their ip-address), however the user could identify infected people using an attack as described in section . . as an update to this protocol the bloom filter can be replaced with a cuckoo filter, which has the benefit of having lower error probabilities for the same size. this is also what is used in . the contactum group is evaluating the efficiency of dcts together with different intervention strategies. the results are being crosschecked using both deterministic and monte carlo based model approaches . the modeling substantiates the following prerequisites of a dcts. first of all, the dcts needs a broad acceptance among the population of more than % in order to have an impact to control an outbreak. we believe this can only be achieved by a decentralized, secure, and privacy preserving design where the users own their data. our goal is to contribute to slow down and eventually to stop the spread of the pandemic using the means of contact tracing complying with privacy laws, it-security standards and the protection of human rights. second of all, tracing of people who were in direct contact with a confirmed infectious person might not be sufficient. sars-cov- can be transmitted pre-symptomatic and a significant fraction of cases is asymptomatic . to illustrate the presymptomatic transmission we consider the following case: person a has been infected with sars-cov- . a can spread the virus to person b already up to - days before symptom onset. b is now a first order contact of a. with a non-negligible likelihood, person b can expose person c to the virus (which had no contact to person a) before person a develops symptoms. c is thus a first order contact of b and a second order contact of a. once person a develops symptoms and is positively tested, person b is notified after having already spread the virus to person c. this emphasizes the need for digital contact tracing and the necessity to not only notify person b promptly, but also person c. we want to enable first order contact tracing (in the previous example: a's contacts: e.g. b), and additionally second order contact tracing (e.g. all of b's contacts, in our example: c). a user who receives a notification that there has been a contact with an infected person can then contact medical authorities and get the permission to upload their random tcns to the server. another possibility to proof contact with an infected person is using a so called zero-knowledge proof. a zero-knowledge proof allows person b to proof that the server and b share a secret value without revealing any information aside the fact that they both know the value. in our case, b can prove knowledge of an infected tcn to the server without revealing the infected tcn. this can then authenticate b to upload their tcns to the server. these tcns can be marked separately, such that the user can get different notifications, depending if there was a direct confirmed exposure or an indirect exposure. tracing second order contacts increases significantly the number of traced potentially infected people. if every direct and indirect contact stayed in quarantine, a huge percentage of the population would be affected. thus, tracing indirect contacts requires rapid testing of potentially infected people. for example, if person b in the above mentioned example is tested negative, person c does not require to stay in quarantine as the likelihood that person c has been infected by person b is small. based on the rapid test results, health authorities can then decide who needs to isolate and who can continue with everyday life as usual. this avoids quarantining large fraction of the population during advanced stages of an epidemic. we evaluate possible attack scenarios in this section. these attack scenarios are not theoretical. reports form south korea show that attacks are done against users and that a state adversary is using the data to invade users' privacy. this emphasizes the importance of an approach, which collects minimal data and where such attacks are prevented by design. more attack scenarios will follow in a second draft of this document. a possible attacker could get the identity of infected people if they can directly access the tcns of the infected people on the server (see section . . ). the attacker can install the app on a device and install the device somewhere together with a video camera. the camera records the people passing by and the device records the peoples' advertised tcns together with the time. after some days, one of the people who passed the device and the camera finds out that they have been infected and uploads their tcns to the server. the attacker can now download them and compare them with the tcns on their device. the app detects a match. now the attacker can check the matching tcns and access the time on the database. then, the attacker checks the video feed at that time and can possibly identify the infected individuals. defense: the use of private set intersection cardinality protects further the identity of infected people. the attacker can still query the server multiple times to find out which of their contact tcns was infected. thus, a rate of queries needs to be limited to make these attacks more expensive. this type of linkage attack is in general possible with any proximity tracing app, which exchanges tcns and notifies users. any tech-savvy user can either use several devices, register their app mulitple times, modify the app and record the identities of other users. with psi-cardinality and a rate limit, this attack becomes difficult and expensive. attackers can use received contact tcns and rebroadcast them as their own. thus, people could get notifications of exposure triggered by falsely broadcasted tcns even though they have not been at risk. in case of an infected attacker, the users would not get any notification, because the wrong tcns, not the attackers' own tcns, have been advertised. defense: each tcn is concatenated with the time interval number. this time interval number is valid for ten minutes. if an attacker received a tcn from a neighbouring phone, they could rebroadcast this tcn to other users. in case this tcn is later marked as infected, only users who received this tcn from the attacker within the ten minute time interval will get a notification. the introduction of the time interval number reduces the validity of each tcn and thus puts a limit on this rebroadcasting attack. in the case of an infected attacker, the users would indeed never get notified of their possible exposure, since the attacker uploads their own tcns. this would be the same if a person had their bluetooth switched off, refused to upload their tcns, or if they had not installed the app at all. an attacker could try to upload somebody else's tcns in order to mark them as infectious. defense: when receiving tcns, the key needed for tcn generation is not transmitted. thus, if the user is asked to upload their pseudo random tcns, they need to provide the key they used for generating these tcns. the server or the medical personels directly generates the tcns with the keys, such that the tcns are verified and the attacker cannot simply exchange tcns. the attacker could of course try to provide fake keys, in which case the uploaded tcns would not lead to any encounter notification, because the tcns were never broadcasted. if an attacker wants to force someone to quarantine (e.g. manipulating sport events or annoying a neighbour), they can get into close proximity to this person and then try to self-report themselves as infected. defense: the server is not directly accessible for the user. a user can only connect to the server if they have been granted access by medical personnel (by e.g. a tan or a qr code). medical personnel only grant access if the user has tested positive for covid- . the only case where self reporting is possible, is if the user has been exposed to an infected user, got the notification and can prove to the server with a zero-knowledge proof that they know an infected tcn. this knowledge serves as authorization to the server and allows the user to upload their own keys for tcn generation. these can be marked as second order tcns and enable second order tracing of contacts as described in section . . we present a summary of current suggested methods and protocols in this section. we also highlight major differences to our approach and where we improve privacy and security aspects. all approaches, except the last one in section . , are similar to our desing of dcts. a broader overview of current efforts (including location based approaches) can be found in . we present the tcn protocol of the tcn coalition, which we are also part of, in this section. smartphones generate periodically changing tcns and advertise them via bluetooth. neighboring devices store observed tcns. an infected user uploads the generated tcns to a server together with an additional memo field where report data can be written. all users then download the list of reported tcns and check whether they have been exposed. this proposal provides server privacy and receiver privacy, however it does not provide source integrity and is vulnerable to linkage attacks. the tcn generation uses two report keys, a report authorization key , and a report verification key , which are used to compute an initial temporary contact key (tck): with h tck as a domain-separated hash function with an output of bit, and ≥ . the tcn is then generated with with tcn as a domain-separated hash function with and output of bit. in this scenario, everyone who knows and can generate all subsequent . in case of infection a user can generate a report for the period of to : with as byte string ( - bytes). the memo field can contain any messages, for example self-reported symptoms. then the user produces a signature (using ) for the and provides || to the backend. users can then verify source integrity by checking the signature over using the patient's . the needs to be changed frequently in order to protect better against linkage attacks; a maximal report time span of hours or less is suggested. in this approach, each user can access the clear text of the patients' tcns. this makes this design vulnerable against linkage attacks as described in section and also against replay attacks, where an attacker re-advertises someone else's tcns. in our approach, private set intersection cardinality (as suggested in section . ) complicates linkage attacks even further and provides additional protection of the patient's identity and privacy. another decentralized approach is presented in from the dp- t group. dp- t presents two approaches, one with lower-cost and another one with increased privacy. we summarize both approaches in the following. • low-cost decentralized proximity tracing: smartphones locally generate ephemeral identifiers ℎ (corresponds to tcns in our approach), change them frequently and broadcast them via bluetooth. neighbouring smartphones store the observed ℎ together with duration and coarse time indication. a diagnosed patient gets authorization by health authorities and uploads a representation of their ℎ . user query the server and get the patients' ℎ . the smartphone then computes the risk score and in case notifies the user. the secret key used for ℎ generation is rotated every day with with as a cryptographic hash function. the ℎ are generated with at the beginning of each day : using a pseudo-random function (e.g. hmac-sha ), a fixed public string , and a stream cipher . each ℎ is then broadcasted for one minute. when infected, the backend collects the patients' and and provides the data to users. each smartphone then reconstructs the ℎ of infected users and checks whether the user has been exposed. this check is limited to a single day, in oder to increase efficiency for lookups and also to limit relay attacks (attacker redistributes captured ℎ ). the smartphone then determines the user's risk score and notifies the user in case the score exceeds a threshold. this approach also offers various additional functions, e.g., storing the country a user has visited to ensure interoperability between countries. also, the user can opt-in to share data with epidemiologists to support research. location data or precise timing information will not be shared. the fact that the user receives a list of all the patients' ℎ makes this approach vulnerable e.g., against linkage attacks, where patient's ℎ could be linked to the patient's identity. • unlinkable decentralized proximity tracing: a second design provides better privacy properties, however it requires the users to download larger volumes of data. the patients' ℎ are not revealed to the users, instead the ℎ on the backend are hashed and stored in a cuckoo filter. this also allows infected users to not upload their data for sensitive locations or times. the general approach remains the same. smartphones generate and broadcast ℎ . the ℎ are generated for each epoch broadcasted time period : with as a cryptographic hash function truncated to bit (truncate ). neighbouring smartphones observe the ℎ and store it as ( ℎ || ), with as a crpytographic hash function. the proximity, duration of encounter and a coarse time indication (e.g. day) is stored as well. when diagnosed, patients upload ( , ) and can also choose for which epochs they want to reveal their ℎ . the backend then computes (truncate ( ( ))|| ) ( ) and inserts the result into a cuckoo filter. the filter is then sent to all users. the users' smartphones then applies the cuckoo filter to their stored observed ( ℎ || ) and can then determine whether the user has been in contact. the risk score and notification remain the same as in the previous low-cost approach. this design also offers the possibility to opt-in to share data with epidemiologists to support research. location data or precise timing information will not be shared. this design offers better protection of the infected users' identities. however, it requires the download of more data compared to the first low cost approach. still, the user has the cuckoo filter of all the patients' ℎ locally on their phone. a tech-savvy attacker can also determine which specific entries of their observed ℎ belong to infected users, e.g. by applying the cuckoo filter to each entry individually and checking for overlaps. our approach using private set intersection cardinality presented in section . provides additional protection of the users' identities. an approach to protect from short-term and remote eavesdropping is also presented in . they propose the use of secret sharing, where each ℎ is spread across beacons. another user needs to receive at least shares in order to properly reconstruct the advertised ℎ . thus an attacker would need to be close to the user for a certain period of time in order to receive shares of the ℎ . apple 's and google's protocol is presented for example in which provides a number for each ten minute time window. in this protocol, the interval number depends on when the key was first generated. each key is valid for hours, corresponding to time intervals. we define as the number of ten minutes intervals since key generation: the rpi for a time where the identifier is calculated (unix epoch time) is calculated with: the paddeddata is a sequence of bytes: • paddeddata [ .. ] = utf ("en-rpi") • paddeddata [ .. ] = x • paddeddata [ . . ] = enin they also offer the possibility to encrypt additional metadata along with the rpi. this metadata can then only be encrypted if the broadcasting has been infected and revealed their tek. a user that was infected and tests positive uploads their teks and the enin where the key validity started to a server. this upload can only be allowed by an official public health authority. the server distributes the keys and distributes them to the users. each user then derives the infected person's rpi with the tek and . afterwards, they match each of the infected rpis with the encounteres identifiers. they allow for a two hour tolerance between when the rpi was supposed to be broadcasted and the actual scan time. if the exposure succeeds a threshold (based on exposure time and proximity), the user receives a notification. in the current design, users have no direct access to the encountered rpi and the infected rpi. only a user with root access to the phone could possibly access this information and perform a linkage attack. rebroadcasting other users' rpis (if accessible with root access) would be possible within two hours time. this provides better protection than allowing attackers to directly access the patients rpis and the encountered rpis. nonetheless, for effectively containing epidemic spread second order contact tracing (see section . ) needs to be introduced. only then an infection chain can actually be interrupted. in a decentralized approach is presented, which is also exchanging tcn via bluetooth. in case of infection, the patient uploads their seed with which the advertised tcns were computed to a server. there, the patient's tcns are regenerated. they include private set intersection cardinality based on diffie-hellmann private set intersection as means to avoid linkage attacks, similar to our proposed solution in section . . however, to actually stop the spread of sars-cov- , second order tracing needs to be realized as well (see section . ). proximity tracing (pepp-pt). it is the only centralized protocol that we summarize in this paper. in this approach, the server keeps a record in a database for each registered app belonging to each user. for user , this record comprises amongst other data a permanent identifier , which is assigned for each registered app only known to the server, a shared key , and a list of exposed epochs. with robert, the tcns are generated by the server and sent to the app. the ephemeral bluetooth identifier for a , for epoch i are generated as: with as a server key stored by the server, and as a block-cipher with -bit block size. additional to the , each user also adds an encrypted country code and the time the message is broadcasted: with mac as an hmac-sha ( , | , ) where is the prefix " ". upon receiving , another app retrieves from and obtains a timestamp , . the app then verifies that with as time tolerance (e.g. some seconds). if this is correct, the app stores ( , , ) in its proximity list. if a user is infected, they can upload their proximity list for the time period where they have been infected to a server , . the server verifies the uploaded data and checks whether a user is at risk to have been infected. it calculates a "risk score" depending on how long and how close a user has been with another infected person. user query status requests from the server regularly and get thus notified if their risk score exceeds a threshold. in this centralized approach, both server and user know whether they have had been in contact with an infected person. the tool is designed to work for different countries. if a server collects data with an country code from another country, it forwards the data to the respective server. the server can link back the temporary identifiers to the permanent unique identifier linked to each user. the deanonymization of each user and also tracing users over time is thus trivial . the users' contacts and social graphs are revealed to the server and an attacker with access to the server can exploit this sensitive information. several other attacks are possible, such as linkability of contacts. a detailed security analysis can be found for example in , and . they conclude that this approach reveals many opportunities for exploitation and systematic misuse. this section covers legal aspects, in particular the regulations in accordance with data protection law . insofar as the use of the dcts app involves the processing of personal data, it must be compliant with the strict requirements of the gdpr. this applies regardless of whether the app is operated by a public authority (at federal or state level) or a private institution. essentially, the following questions arise: the gdpr only applies to the processing of personal data, art. sec. gdpr. conversely, the gdpr does not apply to purely factual or anonymous data. art. sec. gdpr defines personal data as "any information relating to an identified or identifiable natural person ('data subject'); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person". it follows that the key threshold for determining whether data should be considered as personal data is not identification but rather identifiability of a specific natural person. recital gdpr stipulates that "[p]ersonal data which have undergone pseudonymisation, which could be attributed to a natural person by the use of additional information should be considered to be information on an identifiable natural person. [. . . ] the principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable. this regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes.". against this background, there is a strong indication that the gdpr would be applicable to the dcts app. it cannot be ruled out that individual pieces of information in the processing chain will be personally identifiable by the use of additional information. this applies both to the advertised tcns (which could in principle be attributed to a natural person by the use of the respective keys), the keys themselves, the -implicit -attribute "tested positive" as well as the determination of the "contact". in addition, ip addresses as well as mac addresses are typically considered as information on an identifiable natural person. this personal data will be processed, i.e. stored, uploaded and matched, at various stages during the operation of the dcts app. conclusion: personal data are processed and the gdpr is applicable. pursuant to article gdpr, any processing of personal data requires an explicit authorisation. art. gdpr provides a whole range of possible justifications: from the consent of the data subject (i.e. the affected person) to a specific statutory regulation or a balancing of interests test. to the extent that health data are concerned (as would be the case with the -implicit -attribute "tested positive"), even stricter requirements must be met under art. gdpr. however, parallel with art. sec. subsec. a. gdpr, art. sec. subsec. a. gdpr provides a justification if the effective consent of all those affected is obtained. as laid out above in section , the dcts app is conceived to function on a voluntary basis. this means that there should neither be a statutory obligation to use the app nor an automated implementation of the app on all end devices. in fact, the voluntary nature is a crucial factor for achieving widespread acceptance and trust among the population for any digital contact tracing system. it should be noted that some commentators have raised doubts whether a dcts app can realistically be regarded as strictly voluntary because it may at least foster some form of indirect compulsion if the use of the dcts app becomes a de-facto condition for taking part in public and social life. it is true that it cannot be ruled out that a private individual (e.g. a restaurateur) may practically "compel" its contractual partners to use the dcts app. therefore, it is all the more important that the non-use of the dcts must not effectuate any weakening of an individual's legal position vis-à-vis the state. against this background, the european data protection board has already made clear that " [t] he use of such an application [. . . ] may not condition the access to any rights guaranteed by law." this means that, in principle, the dcts app's processing of personal data may be justified through formal consent of each and every individual app user pursuant to sec. subsec. a. gdpr, art. sec. subsec. a. gdpr. however, it should be born in mind that the declaration consent must meet certain requirements. these are set out in art. gdpr (general conditions) and additionally in art. gdpr (particular conditions as regards minors) and art. gdpr (transparency conditions). these provisions in particular stipulate that • the declaration of consent must be explicit ("opt-in" in the context of the installation of the app on the smartphone); • the declaration of consent must be free of any kind of compulsion (despite the urgency, no psychological pressure must be created, which leaves the individual hardly any real choice); • sufficient, very comprehensible information is provided in an easy language that is appropriate for the addressee, based on which the addressee can form their decision ("informed consent"). this requirement is not at all trivial because, on the one hand, precise information must be provided about the purpose, means and all processing steps and circumstances, and, on the other hand, the addressee must not be overburdened. conclusion: if the requirements laid out above (and a few formalities not explicitly mentioned here ) are complied with, the dcts app's processing of personal data may be justified under the gdpr. in addition, a whole range of procedural precautions must be observed in the process of developing the dcts app. these include in particular as online identifiers, see , . as device identifiers, see . in addition, the transmission of personal data within the framework of the bluetooth connection may also occur, see . the european data protection board (edpb) is an independent european body, which contributes to the consistent application of data protection rules throughout the european union, and promotes cooperation between the eu's data protection authorities. the edpb is composed of representatives of the national data protection authorities, and the european data protection supervisor (edps). for further information see . cf. in detail heckmann/paschke, in • fulfilling certain information obligations (art. ff. gdpr), in particular with regard to the rights of data subjects, such as the right to information, the right to revoke consent, the right to deletion, the right to correction (e.g. when a "false-positive" test result is entered into the system); • ensuring data protection through technology design and data protection-friendly default settings (art. gdpr); • ensuring it security (art. gdpr); • providing proper information in case of data breach (art. gdpr); • carry out a data protection impact assessment (art. gdpr); • and several mores. conclusion: none of these requirements stand in the way of developing the dcts app, but need to be incorporated from the outset of the design process to guarantee a compliant and thus sustainable use. in principle, the processing of personal data constitutes an encroachment on the fundamental right to informational selfdetermination. even if this invasion can be justified (in this case: by consent, art. sec. subsec. a. gdpr, art. sec. subsec. a. gdpr), it is nonetheless subject to the principle of proportionality. this means that a legitimate purpose must be pursued through the use of the dcts app and that the means used must be suitable, necessary and appropriate for achieving the purpose. the principle of proportionality must equally be observed in the case of "voluntary" (i.e. consented) app use, especially since the boundaries between voluntariness and compulsion may become blurred in the case of "urgent recommendations" on the part of the federal government as well as state governments and affiliated public institutions. the purpose of this app is identifying chains of contacts to contribute to slowing down the spread of the virus ("flatten the curve") in order to avoid overburdening the healthcare system while easing exit and contact restrictions. in this way, a balance between health protection and restrictions on fundamental rights should be achieved. at present, we still live under considerable restrictions on our fundamental rights (freedom of occupation, property, freedom of assembly, freedom of movement, freedom of religion, etc.). the app is intended to help to reduce these restrictions, but at the same time provide sufficient health protection. this is undoubtedly a legitimate, and from a fundamental rights point of view, even welcome purpose. the means used is a warning and protection system based on a disclosure of the fact of positively tested contacts, in order to give the persons affected the opportunity to take protective measures for themselves and third parties. in principle, this means is suitable to fulfil the purpose. because of the voluntary nature of the system, it is currently unclear whether the dcts app will ultimately be successful. from a constitutional point of view, the principle of proportionality is only violated if a proposed means is "utterly unsuitable" (according to the german federal constitutional court in its settled case law). since it cannot be ruled out that the dcts app will fulfil its purpose, it currently meets the suitability threshold. nevertheless, further evaluating the efficiency of digital contact tracing by refining the statistic models is a key component of the contactum group's dcts research efforts, not least in order to further inform the dynamic proportionality assessment. the next step is to assess whether the concrete setup of the dcts app is necessary to achieve its purpose. this would not be the case if there were milder means that would be equally suitable to fulfill the purpose. such a milder means is not immediately apparent: continuing the lockdown would make the app unnecessary, but would prolong the considerable restrictions on some eu member states have started a process of gradually easing some of the restrictions imposed. nevertheless, the restrictions remain fairly unparalleled, both in terms of their range and comprehensiveness. in several recent decisions, the german constitutional court has made clear that under the current pandemic the scale will typically tilt in favor of the protection of public health when balancing the competing protected legal interests, see bverfg, april , bvr / ; bverfg, april , bvq / ; slightly more in favor of the freedom of assembly bverfg, april , bvr / . there seems to be widespread consensus that processing location and/or movement data is not necessary to fulfill the purpose of digital contact tracing systems, i.e. identifying chains of contacts to contribute to slowing down the virus. other fundamental rights and is therefore not a preferable alternative. relaxing the current contact and exit restrictions without installing a digital contact tracing system seems equally risky, because it is to be feared that people are not sufficiently sensitive to necessary self-restrictions. however, in terms of necessity, there may be a crucial distinction between the two fundamental approaches to digital contact tracing, i.e. centralized and decentralized data reconciliation: • the decentralized approach (as chosen in our proposed dcts app) could be viewed as a milder remedy compared to the centralized approach. this is because with central data reconciliation, all advertised tcns would regularly be uploaded and stored together on the authentication server, at least for a short time. thus, the risk of re-identification of affected persons may be greater. this central server would create a special attack vector, which raises questions of it security. • in the same vein, the decentralized approach seems more in line with the aforementioned principle of data minimization -which is, in essence, a materialization of the principle of proportionality. in the decentralized approach, only the positive cases are stored on the central server, while the actual data reconciliation takes place on the app on the end device. from the perspective of an app user, this aspect could increase the willingness to participate in a digital contact tracing system. • although there are currently no plans to combine the use of the dcts app with compulsory protective measures, such as home quarantine or testing, some people may fear that the fact that they have tested positive could become known to third parties. in the decentralized approach, this fact would only be visible on the smartphone of the person concerned, i.e. it would be "in their hands". from a necessity point of view, this could also speak in favour of the decentralized variant. • achieving widespread acceptance and trust among the population is the key factor for any digital contact tracing systems's success. only if the potential users trust in the app's privacy architecture, they will actually use and follow the app in their daily lives. against this background, it is crucial to obtain the endorsement of trustworthy institutions like the data protection agencies. in their letter to the european commision of april , the edpb has made clear that "[i]n any case, the edpb wants to underline that the decentralised solution is more in line with the minimisation principle." conclusion: assuming that centralized and decentralized approaches to designing a digital contact tracing system are equally suitable to contribute to slowing down the spread of the virus, there are good arguments that a decentralized approach (as chosen in the present case with the dcts app) is the preferable approach both from a proportionality and data minimization point of view. on may , a group of data protection experts published a legislative proposal for a german "law on the introduction and operation of an app-based tracing of infection risks with the sars-cov- (corona) virus" (the dcts law proposal). the dcts law proposal covers the requirements, framework and limits for the installation, use and operation of a dcts app (sec. subsec. ). with regard to germany, the authors argue that such a statutory basis is required for any digital contact tracing system. the dcts law proposal lays out that the purpose of any dcts app is to foster contact tracing within the framework of the german federal infection protection act and to enable earlier infection warnings (sec. subsec. ). in addition, users shall be enabled to take additional measures to limit further spreading of the pandemic and thereby prevent an overburdening of the of the healthcare system capacities (sec. subsec. ). sec. stipulates the voluntary nature of the app. the installation or use of the app may not be brought about either directly or indirectly by any form of compulsion. it is also forbidden to link any direct or indirect advantages or disadvantages to the use or non-use of the app. in the event of a notification of a potentially infectious encounter, there is no obligation to enter into (self-)quarantine, to undergo a medical test for covid- or to notify third parties. in addition, users must be able to deactivate, terminate or delete the app at any time. pursuant to sec. , the app as well as the server infrastructure necessary for its operation are provided and operated by the robert koch institute. sec. specifies the technical details for establishing potentially infectious encounters as well as for the data storing and processing. sec. stipulates that any data (tcns, keys as well as any meta data) created, advertised or received by the dcts app must not be used for any purposes other than those specified in the dcts law proposal. sec. requires a data protection impact assessment (cf. art. gdpr), sec. requires that the source code of the app be published. sec. prescribes that any user notified of a potentially infectious encounter the user shall be entitled to an immediate medical test for covid- . sec. prevents potential dcts app operators from blocking the development of alternative dcts apps by third parties. in case a medical test for covid- is carried out, sec. provides the users with the additional option of transmitting their test results to the dcts app operator for research purposes. the data transmitted may only be used by the operator to analyse and improve the accuracy of the matching process. finally, the dcts law proposal shall cease to have effect when the german bundestag finds that an "epidemic situation of national significance" no longer exists, but no later than one year after its entry into force (sec. subsec. ). the dcts law proposal enhances the evolving academic discussion around digital contact tracing systems. whether such legislation should actually be passed remains to be discussed. many of the provisions have a merely declaratory value and are somewhat characterized by a lack of trust in public institutions. this lack of trust may be attributable to the (perceived) political contention of the last weeks (centralized vs. decentralized approach, voluntary nature vs. compulsion, confusion surrounding the parallel release of a data donation app, etc.). however, it is questionable whether such contention is sufficient to require legislative action. as regards the dcts law proposal, four dimensions should be distinguished: • provisions such as sec. (voluntary nature) have a merely confirmatory and declaratory character. even without such a provision, the processing of personal data would require a justification under the gdpr. art. sec. subsec. a. gdpr, art. sec. subsec. a. gdpr provides for such a justification if the effective consent of all those affected is obtained (see above under . ). such consent must, in accordance with art. subsec. gdpr, not only be voluntary, but revocable at any time. the same goes for the provisions on deletion, purpose limitation and data protection impact assessment (secs. [ ] [ ] [ ] , which can equally be derived from the gdpr. • the specification of the technical procedure (sec. ) is redundant in the sense that it adopts a preliminary consensus reached in the scientific and political debate. putting such a (temporary) consensus into law may make it difficult to adapt the technological standard to new findings. • provisions such as sec. (assigning the robert koch institute as operator) have more than declaratory effect, as such organizational determination is typically not provided for by law. • secs. - clearly go beyond applicable law: neither the obligation to publish the source code nor the right to an immediate medical test for covid- nor the requirement to approve alternative apps result from previous legislation. from a legal policy perspective, it remains to be discussed whether these regulations are desirable. however, one aspect of the dcts law proposal indeed seems counterproductive: the proposed statute's automatic expiry after one year (sec. subsec. ) will not prevent the app itself from being continuously used (for example on the basis of consent, see above). on the contrary: only the restrictive regulations, such as those on rigorous earmarking, would then no longer apply. conclusion: all in all, the dcts law proposal is an important contribution to the academic discussion and could contribute to the overall acceptance of a digital contact tracing system. however, there is a risk that initiating a legislative process at this stage may further delay the introduction of the dcts app. alternatively, similar effects may be achieved by other public trust-building measures (such as the publication of "best practices" by the federal government): "law in action" instead of "law in books". an overarching, general legal framework for similar applications aiming at nationwide risk prevention could then be created in a next step. such an approach could also counteract the criticism that the dcts law proposal constitutes an impermissible "single case" legislation. at any rate, parliamentary involvement in this complex matter is certainly to be welcomed. we present an improved decentralized, privacy preserving approach of a digital contact tracing service. we protect the users' identities and their personal data and focus on privacy and it-security concerns. we incorporate legal principles and requirements, such as the gdpr or the council of europe's convention , into the very design of our solution. in the decentralized approach, each infected user has the choice to contribute to fighting the pandemic and provide their advertised random ids. this information concerns the user themselves and nothing can be inferred about other people. the worst case scenario of a potential security breach is the identification of infected users. in the centralized approach, each infected user can upload their observed ids. this information is not only concerning the infected user who is revealing their social graph. this information concerns also every person the infected user has met in the past two/three weeks, as at least parts of their social graphs are revealed as well. the worst case scenario of a potential security breach now spans an entirely different magnitude. a variety of information can be obtained from a social graph with severe consequences for the individual. social graphs of people can lead to identification of members belonging to groups of minorities (religious or otherwise) or even uncovering and endangering e.g., journalistic sources, whistle-blowers or political activists. they might reveal educational and social status, political circles and opinions, religious believes, and further sensitive information about people. the tracing of infectious contacts, digital or not, is an epidemiological tool that works effectively only if coupled with the ability to test potentially infected people quickly. besides, the impact of tracking on epidemic containment depends on many factors that can also change over time. for this reason, the appropriate use of contact tracing must be permanently monitored and optimized through the use of proper epidemiological models. south korea is reporting intimate details of covid- cases: has it helped more scary than coronavirus': south korea's health alerts expose private lives unbalanced private set intersection cardinality protocol with low communication cost epidemiological parameters of coronavirus disease : a pooled analysis of publicly reported individual data of cases from seven countries transmission of -ncov infection from an asymptomatic contact in germany contactum consortium . digital contact tracing and its impact on the sars-cov- pandemics. tba, work in how reliable is smartphone-based electronic contact tracing for covid- contact tracing -cryptography specification the dp- t project . decentralized privacy-preserving proximity tracing tor: the second-generation onion router. tech. rep., naval research lab washington dc mobile private contact discovery at scale an improved algorithm for computing logarithms over gf(p) and its cryptographic significance (corresp.) estimating the asymptomatic proportion of novel coronavirus onboard the princess cruises ship sars-cov- asymptomatic and symptomatic patients and risk for transfusion transmission estimation of the asymptomatic ratio of novel coronavirus infections (covid- ) unified research on privacy-preserving contact tracing and exposure notification exposure notification -framework documentation exposure notification -cryptography specification android exposure notification api documentation epione: lightweight contact tracing with strong privacy robust and privacy-preserving proximity tracing overview of sample mobile application covid notions: towards formal definitions -and documented understanding -of privacy goals and claimed protection in proximity-tracing services pepp-pt data protection architechture -security and privacy analysis the dp- t project . robert -security and privacy analysis orientierungshilfe zu den datenschutzanforderungen an app-entwickler und app-anbieter microsoft mobile device users corona-apps" und zivilgesellschaft: risiken, chancen und rechtliche anforderungen. guidelines / on the use of location data and contact tracing tools in the context of the covid- outbreak ds-gvo prüfsteine für die beurteilung von vorschlag für ein gesetz zur einführung und zum betrieb einer app-basierten nachverfolgung von infektionsrisiken mit dem sars-cov- (corona) virus engeler m. warum wir ein corona-tracing-gesetz brauchen we improve existing concepts through the use of private set intersection which provides better privacy for the infected users. additionally we show a way to trace second order contacts in a decentralized system while protecting the privacy of the user. this approach also prevents attackers from uploading contacts without prior contact to an infected person. key: cord- -m f de authors: trivedi, amee; zakaria, camellia; balan, rajesh; shenoy, prashant title: wifitrace: network-based contact tracing for infectious diseases using passive wifi sensing date: - - journal: nan doi: nan sha: doc_id: cord_uid: m f de contact tracing is a well-established and effective approach for containment of spread of infectious diseases. while bluetooth-based contact tracing method using phones have become popular recently, these approaches suffer from the need for a critical mass of adoption in order to be effective. in this paper, we present wifitrace, a network-centric approach for contact tracing that relies on passive wifi sensing with no client-side involvement. our approach exploits wifi network logs gathered by enterprise networks for performance and security monitoring and utilizes it for reconstructing device trajectories for contact tracing. our approach is specifically designed to enhance the efficacy of traditional methods, rather than to supplant it with a new technology. we design an efficient graph algorithm to scale our approach to large networks with tens of thousands of users. we have implemented a full prototype of our system and deployed it on two large university campuses. we validate our approach and demonstrate its efficacy using case studies and detailed experiments using real-world wifi datasets. approaches use bluetooth for proximity sensing, sometimes in combination with gps and other locationing techniques present on the phone for location sensing [ ] . in this paper, we present an alternative network-centric approach for phone-based contact tracing. in contrast to client-side approaches that depend on the use of bluetooth and mobile apps a network-centric approach does not require data collection to be performed on the device or apps to be downloaded by the user on the phone. instead, users use their phone or mobile device normally and the approach uses the network's view of the user to infer their location and proximity to others. our approach is based on wifi sensing [ , ] and leverages data such as system logs ("syslogs") that are already generated by the enterprise wifi networks for contact tracing. although our approach does not require the use of wifi location [ ] , such techniques, where available, can further enhance the efficacy of our approach. our network-centric approach to contact tracing offers a different set of trade-offs and privacy considerations than bluetooth-based client-centric methods; one of the goals of our work is to carefully analyze these tradeoffs. the following scenario presents an illustrative use case of how our approach works. consider a student who visits the university health clinic and is diagnosed with an infectious disease. the university health clinic officials decide to perform contact tracing and seek the consent of the student for network-based contact tracing. since the user could have transmitted the disease to others over the past several days it is important to determine what campus buildings and specific locations within each building were visited by the student during that period and which other users were in the proximity of the student during those periods. the health officials input the wifi mac address of the student's phone into the network-centric contact tracing tool. the tool analyses wifi logs generated by the network, and specifically association and dissociation log messages for this device, at various access points on campus to reconstruct the location(building, room numbers) visited by the user. it further analyzes all other users who were associated with those access points at those times to determine users who were in proximity of the user and for how long. this location and proximity reports are used by health officials to assist with contact tracing. additional reports for each impacted user can be recursively generated. in designing, implementing, deploying, and evaluating our network-centric contact tracing tool, our paper makes the following contributions: • we present a network-side contact tracing method that involves passive wifi sensing and no client-side involvement. we discuss why such an approach may be preferable in some environments, such as academic or corporate campuses, over client-side methods. • we present a graph-based model and graph algorithms for efficiently performing contact tracing on passive wifi data comprising tens of thousands of users. • we implement a full prototype of our system and deploy it on two large university campuses in two different continents. • we validate and experimentally evaluate our approach using anonymized data from two large university networks. our results show that the efficacy of contact tracing for three simulated diseases and highlights the need to judiciously choose wifi session parameters to reduce both false positives and false negatives. through case studies, we show the efficacy of judicious iterative contact tracing while avoiding an exponential increase in co-located users who need to be traced, and also evaluate our approach for normal campus mobility patterns and mobility patterns under quarantine. we show that our graph-based approach can scale to settings with tens of thousands of users and also present the limitations of using wifi sensing for contact tracing. in this section, we provide background on contact tracing and present motivation for our network-centric approach. : am : pm : pm : pm location location wifitrace (proximity report) location : : am - : am : pm- : pm location : : pm - : pm location : : am - : pm location : : pm - : pm wifitrace (location report) -arrival -departure contact tracing: contact tracing is a well-established method that is used by health professionals to track down the source of an infection and take pro-active measures to contain its spread [ ] . the traditional method is based on questionnaires -upon diagnosis, the user is asked to list places visited and other people whom they have had contact and this information is used to iteratively contact these individuals and so on [ ] . the goals of contact tracing are two-fold: identify the potential source of infection for the diagnosed individual and determine others who may have gotten infected due to proximity or contact. since there is often a to day incubation period between the time of infection and onset of the illness, infected users often need to use their recollection of where they have been over multiple days or weeks, a process that can be error-prone due to gaps in memory. the manual process is challenging to scale up to larger numbers of users, especially for larger outbreaks of disease. phone-based contact tracing: since smartphones are now ubiquitous, the use of phone-based sensing for contact tracing has emerged as a key technology to automate and scale the contact tracing process [ , ] . the most common method involves the use of bluetooth to transmit a unique (and often anonymized) identifier from each phone. a phone also listens for such identifiers from other phones in its proximity. thus, the device can determine which other users/phones are in its proximity at each time instant. when a particular user contracts an infection, their device id is used by others to determine if they have been in the proximity of the infected user. this basic approach has been implemented by apple and google into their contact tracing api [ ]. many standalone contact tracing apps have also implemented this approach, which also involves having each phone upload its collected data to a server for contact tracing analysis [ , , , , ] . we note that such a client-centric approach requires a user to first download a mobile app before contact tracing data can be gathered-users who have not downloaded the app (or have opted in) are not visible to other phones that are actively listening for other devices in their proximity. thus the overall effectiveness of the approach depends on the level of user adoption. this is seen as a key hurdle from the experience of singapore's tracetogether app [ ] , which has seen only . million downloads despite needing a critical mass of million active users (around two-thirds of the population) to be effective [ ] . health experts have argued that while technology-based contact tracing solutions are useful, they should not be seen as a replacement for traditional means of contact tracing, which is still an effective approach [ ] . our network-centric approach is designed to address these issues. first, it is designed to help health professionals improve traditional contact tracing methods, rather than supplant manual contact tracing using technology. our network-centric tool is designed to integrate into health professional's contact tracing workflows; unlike some bluetooth apps, they are not designed for end-users to self-monitor their proximity to infected users. second, a network-centric approach overcomes the critical mass adoption hurdles faced by bluetooth approaches-since it is based on passive wifi sensing that does not require any app to be downloaded by users or require active client participation. with near-ubiquitous availability of wifi in environments such as offices, university campuses, and shopping areas, wifi sensing has emerged as a popular approach for addressing a range of analytic tasks [ , ] . wifi sensing can be client-based (i.e. done on the mobile device) or network-based (i.e. done from the network's perspective). performing triangulation via rssi or time of flight measurements to multiple wifi access points to localize a device's position is an example of client-side wifi sensing [ ] . in contrast, network-centric wifi sensing involves using the network's view of one or more devices to perform analytics. the approach has been used for monitoring the mobility of wifi devices by analyzing the sequence of the access points that see the same device over a period of time [ ] . while mobility characterization and modeling using wifi sensing has seen more than a decade of research [ , ] more recent-work has leveraged wifi sensing for a range of analytic tasks such as tracking health [ ] , stress [ ] , retail analytics [ ] and more. we build on this prior body of work and focus on the network-centric approach for contact tracing. the key premise of the approach is that the mobility of a user's phone is visible to the network through the sequence of wifi access point associations performed by the device as the user moves, which allows the network to determine the locations visited by the users' device and other co-located devices that were present at those locations by virtue of being associated with those aps. thus, the approach relies on passive wifi sensing by passively observing devices as they move through the network. there are some key advantages of such an approach over a client-centric approach, unlike a client-based approach that needs a critical mass of users to opt-in or download an app before proximity can be effectively determined, the wireless network can "see" all devices that are connected (associated) to it at all times. hence, a network-centric method is easier to deploy and scale to large numbers of users without any initial deployment hurdles. second, the client-centric approach involves data collection on each device for proximity sensing. by its very nature, a network-centric approach does not require any data to be collected on the device. in many cases, the approach may not even require an additional data to be collected by the network. this is because this method relies on syslog of network events, snmp reports, or rtls events that are routinely logged by many enterprise networks for purposes of performance and security monitoring. our network-centric approach "mines" this already logged data for performing contact tracing. of course, our approach does require network logging of ap events by the network if this information is not already being logged. third, a client-centric method uses bluetooth for proximity sensing and must use a second sensing modality such as gps for sensing location where those devices were seen. in contrast, a network-centric approach can use a single modality -wifi sensing -to determine the location (based on the ap locations) and proximity (based on ap associations). note that methods like gps do not work well inside buildings, while passive wifi sensing can provide ap-level locations of users even without any additional wifi locationing technology. however, the approach is not without challenges. bluetooth-based approaches claim to sense other devices that are within a few feet of the user, which is then used for proximity analysis. although the use of bluetooth to coarse-grain proximity measurements (e.g., users co-located within the range of an access point). coarse-grain proximity sensing can increase false positives, and hence the approach uses the duration of proximate co-location as an indicator of risk of infection and the duration of proximate co-location can be determined accurately (same as bluetooth). moreover, since we designed our approach to enhance traditional contact tracing, rather than replace it, coarse-grain proximity information, along with co-location duration, is still useful to health professionals for identifying users who should be subjected to traditional contact tracking checks. wifi-based contact tracing only works in areas with wifi coverage -which are largely indoor spaces and a few key outdoor spaces. this method does not work outdoors where no wifi coverage is available. in contrast, bluetooth methods work "everywhere"-both indoors and outdoors-since they involve listening to other devices and do not depend on a network. while this is a key limitation of a network-centric approach, they are nevertheless effective in university campuses or corporate environments where employees spend a significant portion of their day. finally, all contact tracing methods, whether client or network-based raise important privacy concerns. however, privacy considerations of network-based methods are different from those of bluetooth-based client methods [ ] . we discuss these in detail in section § and show how user privacy can be safeguarded in such methods. the deployment of network-centric contact tracing technology raises privacy issues, which we discuss in section . ethical considerations that came up during the design of this technology are discussed here. data collection for experimentally validating the efficacy of our approach has been approved by our institutional review board (irb) and is conducted under a data usage agreement (dua) with the campus network it group that restricts and safeguards all the wifi data collected. to avoid any privacy data leakage all the mac ids and usernames in the syslogs are anonymized using a strong hashing algorithm. the hashing is performed before syslog data is stored on disk under the guidance of the it manager who is the only person aware of the hash key of the algorithm. any data analysis that results in the de-anonymization of the users is strictly prohibited under the irb and signed dua. users on the usa campus involved in the data collection consent to an acceptable use it policy, that permits the campus it department to collect network-level syslog data events for a system diagnosis or analysis of cyber-attacks on the enterprise network. additionally, all researchers sign a form of consent to adhere to the signed irb and dua and undergo mandatory ethics training. in short, the data used to validate and evaluate our approach prior to its actual deployment is anonymized and subjected to multiple safeguards as part of an irb-approved study. this section presents an overview of our approach, followed by the details of our graph-based contact tracing algorithm. fig depicts the architectural overview of our contact tracing system. the system uses a three-tier pipelined architecture. the data collection tier uses network logging capabilities that are already present in enterprise wifi systems to collect the wifi logs of device associations to access points within the network. many enterprise it administrators already collect this data for network monitoring, in which case this data can simply be fed to the next tier in the pipeline. otherwise, the it admins need to turn on logging to start gathering this data. the next tier in the pipeline ingests this raw data and converts it into a standard intermediate format. in other words, this tier performs pre-processing of the data. since the raw log files will have vendor-specific formats, this tier implements vendor-specific pre-processing modules that are specific to each wifi manufacturer and its logging format. this tier processes log files in batches every so often and generates data in intermediate form. our final tier ingests the data produced by the vendor-specific pre-processor and creates a graph structure that captures the trajectories of user devices. it exposes a query interface for contact tracing, for each query, it uses the computed trajectories over the query duration to produce (i) a location report listing locations visited by the infected user and ( ) a proximity report listing users who were in proximity of that user and for how long. as discussed below, this tier uses time-evolving graphs and efficient graph algorithms to efficiently intersect trajectories of a large number of devices (typically tens of thousands of users that may be present on a university campus) to produce its report. consider a wifi network with n wireless access points that serves m users with d devices. we assume that the n access points are distributed across buildings and other key spaces in an academic or corporate campus and that the location of each access point (e.g., building, floor, room) is known. large enterprises such as a residential university will comprise thousands of access points (our work is based on deployment and data from two large overview : user name : janedoe start time : : am /jan/ end time : : pm /jan/ showing all locations visited for mins or higher visit details : overview : user name : janedoe start time : : am /jan/ end time : : pm /jan/ displaying co-located users in descending order of total co-location time. number of users co-located : alice ... . an example contact tracing report produced by our tool: (a) patient report (b) proximity report universities, one based in the northeastern usa that comprises access points and one based on singapore that comprises , access points). the number of users and devices seen in such networks is typically in the order of tens of thousands. to manage such a large network, enterprise wifi networks uses controller nodes that have the capability to administer and manage the aps and the network traffic, along with detailed logging and reporting capabilities. as a user moves from one location to another, their mobile device (typically a phone) associates with a nearby access point. since the locations of aps (building, floor, room) is known, the sequence of ap associations over the course of a day reveals the trajectory of the user and the visited locations. to reconstruct this trajectory we assume that the wifi network logs contain association and disassociation events as seen by each ap. typically this information is of the form: timestamp, ap mac address, device mac, optional user-id, event-type, where event-type can be one of association, disassociation, reassociation, authorization, and unauthorization. typically when a device switches to a new ap due to user mobility, this is visible to the network in the form of disassociation with the previous ap and an association with a new ap. given this log information, contact tracing of a user involves two steps: ( ) determine all aps visited by the user in the specified time period and ( ) determine all users who were associated with each of those aps concurrently with the infected user. to do so, we can analyze the log to first construct the time-ordered sequence of ap sessions of the concerned device (a session is the time period represented by an association followed by a disassociation). since ap locations are known, this session list represents the location visited by the user and the time duration. next, for each ap session in the above user trajectory, we can analyze the log to determine overlapping sessions of all other users at that ap. these are users (i.e. their devices) who were present in the proximity of the infected user. of course, the wifi log does not reveal the distance between the two users or whether physical contact occurred. nevertheless, it enables us to determine users at risk by computing the duration for which the two users were in proximity of one another. in some cases, the location where they were co-located may reveal the degree of risk (e.g., a hour long meeting in a small conference room or a lecture classroom). to enable health ... professionals to further assess the risk during contact tracing, we generate a location report, showing locations visited by the user and for how long as well as a a proximity report of co-located users at each location and the duration of co-location. figure depicts a sample report resulting from the process. since an enterprise network with thousands of aps and tens of thousands of devices will generate very large log files (for example, the log file from one of our campuses contains more than billion events over a month semester period). scanning the log to compute the location and proximity can be slow and inefficient. consequently, we present an efficient graph-based algorithm based on time-evolving graphs in the next section. to efficiently process contact tracing queries, we model the data as a bipartite graph between devices and aps. each device in the wifi log is modeled as a node in the graph; each ap in the network is similarly modeled as a node. an edge between a device node and an ap node indicates that the device was associated with that ap. each edge is annotated by the time interval (t , t ) that denotes start and end times of the association session between that device and the ap. note that data is continuously logged to the log files, which causes new edges to be added to the graph as new associations are observed and new nodes to be added as new devices are observed in the logs. thus, our bipartite graph is a time-evolving graph. for computational efficiency, each device and ap node in the graph is limited to a time duration, say an hour or a day. this is done to limit the number of edges incident on each node, which can keep growing as device associate with new aps or aps see new association session. as a result of associating a time duration with each node, each device or ap is represented by multiple nodes in the graph, one for each time duration where there is activity. in this case, we can view the node id as the mac address concatenated by the time duration. for example, mac [ : , : ], mac [ : , : ], represent two nodes for the same device, each capturing ap association edges seen within that period. in case of ap nodes, this would capture all device association to that ap within those time periods (see figure ). the duration for partitioning each node's activity in the graph is a configurable parameter, and this duration can chosen independently for a device node and an ap node if needed. given such a bi-partite graph, a contact tracing request is specified by providing a device mac address and a duration (t st ar t , t end ) over which a contact trace report should be generated. the query also takes a threshold τ that specifies only ap sessions of duration longer than τ should be considered. the graph algorithm first identifies all device nodes corresponding to this user that lie within the (t st ar t , t end ) interval and identifies all edges from these nodes. these edges represent all ap locations visited by the user, and session durations represent the time spent at each location. only edges with the following constraints are considered: ( ) the session must lie within the query time interval, i.e., [t , t ] ∈ [start, t end ] and ( ) the session duration must be at least τ , i.e., (t − t ) ≥ τ . edges that do not satisfy either of the above criteria are ignored and the remaining edges are used to enumerate the ap locations visited by the device and the time duration spent at each location. to compute the proximity report, the algorithm traverses each edge and examines the corresponding ap node. for each ap node, the list of incident edges corresponds to all devices that had active sessions with that ap. the session duration [t , t ] on each edge is compared to the infected users session [t , t ] and the edge is included only of the two session overlap. this process yields a list of all other users who had an overlapping session with the infected user. the algorithm can also take an optimal parameter w that indicates the minimum overlap in session between the two for the user to be included in the proximity representation, i.e., w ≥ [t , t ] ∩ [t , t ]. the parameter w specifies the minimum duration of co-location necessary for a user to be included in the proximity report. algorithm lists the pseudo code for our graph algorithm. thus, a time evolving bipartite graph allows for efficient processing of contact tracing queries over a large dataset. since contact tracing technologies use location and proximity information of users, they raise important privacy concerns. privacy concerns for client-side bluetooth-based applications are well-known [ ] . since networkcentric client tracing is an alternative approach that raises a different set of concerns, we discuss these issues in this section and describe techniques used in our ongoing deployments to mitigate them. first, our network-centric tool is aimed at health-care and medical professionals who perform contact tracing and is not an end-user focused tool for self-monitoring prior contacts. contact tracing is a well-established approach that has traditionally been performed manually through questionnaires [ ] . our tool has been designed to fit into this workflow and serves as an additional source of information, in addition to interviews, for professionals engaged in contact tracing. unlike some bluetooth-based apps, it does not allow end-users to lookup information about themselves or anonymous infected users. by focusing on health professionals and not end-users, our tool avoids some of the privacy pitfalls from giving end-users access to anonymous proximity data. second, even though data access is limited to health professionals, the data contains sensitive location information and is still prone to privacy misuse. there are two approaches to handling this problem. first, we recommend that operational control of the tool be in the hands of the organization's it security group. recall that the approach is based on wifi network monitoring data that is already routinely gathered by it departments for network performance and security monitoring. for example, our campus uses such data to track down compromised devices that are connected to our wifi network and may be responsible for ddos attacks from inside. another example is tracking down student hackers, since the hacking of university computers (e.g., to change course grades) is a common exploit on university campuses. audit and compliance laws in many regions also necessitate gathering network logs for subsequent analysis and audits. to address these issues, it departments routinely collect detailed network logs and use them for optimizing performance or handling security incidents. since the it department already has access to the raw data used by our network-centric tool, deploying the tool within the it department does not increase privacy risks (since this raw data is already prone to the same privacy risks independent of our tool, and it departments have strict safeguards in place to protect such data and limit access to it). here, limiting operational control of the tool to the organization's it group can provide good privacy protection in practice. however, it may not always be feasible to limit control of the tool to it professionals alone. for instance, larger outbreaks of disease may require allowing direct access to health officials who are performing contact tracing. in this case, we can address privacy concerns by not storing user identities or real mac addresses with the tool itself. instead, user names and device mac addresses are anonymized by a cryptographic hash (eg sha- hash). all queries on the tool are done using hashed identities and not the real ones. the actual mapping of user names and device mac addresses to their hashed values is stored separately from the tool, and this information is accessible only to a small trusted group. to perform contact tracing on an individual, this trusted person needs to authorize it (e.g., once user consent is obtained) by releasing the mapping of the actual name of the user and their device mac to the hashed values. the tool can then be queried using the hashed values of that user's information. similarly, once proximity reports are generated, they can be sent to the trusted person, who can then deanonymize that information using the mapping table. in this manner, it is not possible to query the tool to track activities of an arbitrary user, unless first authorized, which prevents misuse. our current campus deployment uses this anonymized data approach for additional safety. finally, many countries have strict privacy laws that require user consent before collecting sensitive data. to comply, many organizations require users to consent to their it policy that enables them to gather network data for critical safety operations-a prerequisite for such network monitoring. further, health care professionals are required to obtain user consent to perform manual contact tracing-a process that can be used for network-centric contact tracing as well. this section presents our system implementation. we have implemented our system using python and perl. our tool is available as open-source code to researchers and organizations who wish to deploy it (source code is available at http://wifitrace.github.io) as shown in figure , our implementation uses a three tier architecture. the first tier is based on the logging capabilities that are already supported by enterprise-grade wifi networks. our system simply uses these capabilities and implements only the next two tiers. our system currently supports wifi access points from cisco and hp/aruba, two large vendors of enterprise wifi equipment. we have implemented a pre-processing code for both these professionals can decide whether to pro-actively notify co-located individuals who are deemed to be at risk during an outbreak or to publicize a list of locations and times visited by an infected individual(s) and request other users to contact them if they are impacted. in the latter case, the proximity report data is used for further contact tracing once co-located users contact health officials. the latter approach is presently used on our usa campus. vendors to take raw monitoring data and convert it to a standard intermediate data format for our second tier. for hp/aruba network, our tool supports the processing of both syslogs (generated by arubas wifi controllers) as well as rtls logs generated by aruba aps. both types of logs provide association and disassociation information. in case of aruba rtls, we log wifi data directly from the controller nodes using either real-time location services (rtls) apis [ ] . in case of aruba syslogs, we periodically copy the raw syslogs generated by the controller and pre-process this raw data. for the cisco networks, we log wifi data directly from the network using the cisco connected devices (cmx) location api v [ ] . all of these preprocessor scripts convert raw logs into the following standard record format : timestamp, ap name or id, device mac id, event type, (optional) user name by default, we assume anonymized (or hashed) device macs and usernames and assume a separate secure file containing a mapping of real names to hashes. our third tier implementation then uses this data to support contact trace querying. a query is of the form (hashed) username or device mac, start duration, end duration, threshold τ , and co-locator treshold w. internally the data generated by the pre-processing code is represented as a bi-partite graph, as discussed in section . . our system supports a variety of queries on this graph through a graph api depicted in table . this graph api is used to implement the graph algorithm described in section . . the algorithm yields a location report, which shows all locations (aps) visited by the user for longer than τ and a proximity report that shows all users who were connected to those aps for a duration greater than w. figure shows a sample location and proximity report generated by our system. in addition to human-readable query reports, our system can optionally output query results in json format, which is convenient for visualization or subsequent processing. our system also supports additional report types beyond location and proximity reports. for example, it can produce reports of additional users who visit a location after the infected user has departed from that location. this is useful when a location has high-contact surfaces that may continue to transmit a contagious disease even after an infected user departs. such a report can produced by specifying a window parameter, that specifies the time window over which additional users are identified as being at risk at each location after the user departs. we have deployed and operationalized our tool on both our university campuses (one in northeastern usa and one in singapore) through a collaboration with our university's health and it service. both campuses have large wifi networks, one with hp/aruba aps and the other with a mixed cisco/aruba network of , aps. while our tool can be used for contact tracing of any infectious disease(we have originally begun developing it inspired by an outbreak of meningitis on our campus), health officials on both campuses view it as a method for scalable contact tracing for covid- . while our tool has been operational for several months, fortunately, as of may , neither campuses had seen any covid cases on the campus that required the use of our tool. this is largely because residential universities such as ours switched to online learning in march and asked most students to vacate their dorms and enforced a work-from-home policy for faculty and staff. except for a small number of students who were unable to return to their home countries (due to global lockdowns), the campus have been largely empty. one of our campuses saw a single employee case of covid- , but initial (manual) contact tracing determined that the employee worked in a setting with limited contact with others, and university health professionals did not see a need to perform additional contact tracing, manually or using our tool. wifi sensing has been used by researchers for mobility studies since the early s [ , ] , and it is well established among researchers that wifi devices reveal user mobility patterns. our work builds on this wifibased mobility research, and in this section, we validate its use for network-centric contact tracing. we conducted a small-scale user study to gather ground truth data to validate three question related to the use of passive wifi sensing for contact tracing: ( ) how accurately do wifi access point associations reveal true user locations? ( ) how accurately do wifi session durations reveal true durations of times spent at a location? and ( ) how accurately do co-located wifi device sessions reveal co-located users at those locations? to answer these questions, we had a group of volunteers walk around our campus to visit multiple locations for varying durations while carrying their mobile devices. each user manually logged the entry and exit times at each location as well as the path used to walk from one location to another. the trajectories of some of the user's devices were correlated, which meant the users were co-located whenever the devices were connected to the same ap concurrently. our user study produced a ground truth dataset that includes seven devices that visited a total of , distinct locations over a course of ten days. for each of the user, we computed a location report containing all visited locations (assuming a threshold τ = ) and compared the locations as seen by the wifi network to the ground truth locations recorded by each user. figure (a) shows the confusion matrix, with a precision of . , recall of . , and a high f -score of . . as can be seen, the inferred location matches the ground truth location with high accuracy. the errors mainly occur when a user is walking (in all cases, these involved short session of tens of seconds to minutes). when a user is in transit between locations, their mobile device makes ap transition by disassociating from previous ap and associating with a more proximate ap. the threshold for switching aps and aggressiveness of these switches varies across mobile phone makes, models and manufacturer. this results in some mobile phones that stay connected with an earlier ap even through there is a nearby ap with better connectivity; this can result in a location error where the ground truth location is a bit further away from that shown by the more distant ap. in almost all cases, it the user stays at the new location for more than a few minutes ( to minutes in our observations), their phone switches to the closer ap which has a stronger signal. hence, for very short sessions during walks, the true location may be off from the inferred location by up to one ap "cell. " figure depicts the accuracy of the inferred location for varying session lengths observed across four of the devices (namely, iphone, samsung, motorola and lg phones) used in our user study. as can be seen, once the session length exceeds around minutes, the accuracy rises to %. for contact tracing, we are typically interested in locations that visited by a user for a few tens of minutes; as shown in the figure, the approach provides high accuracy for such cases. figure (b) shows a scatter plot of session duration as reported by our tool and the ground truth. as can be seen, there is good match between the actual and ground-truth of session durations; the small errors occur at location entry or exit due to the lag in the mobile device switching to the nearest ap. next we validate the accuracy of co-locations. we use our tool to generate the proximity report for each device and compare it to the ground truth trajectories reported for each device. figure shows the accuracy of the co-located devices as seen by our approach. we see that our approach can capture co-located devices (and users) with high accuracy for sessions exceeding minutes. as noted above, short transitions are often off by one ap cell, which implies that two devices that are near one another will be seen by the network as being connected to adjacent aps, rather than the same one. fortunately these effects do not hamper the efficacy of contact tracing since two users need to be near one another for a period of time (e.g., minutes or more) to be considered at risk. as can be seen, longer sessions are captured with high accuracy. finally, we conduct a validation experiment where we count the number of users entering and leaving room in the library and compare it to the number of devices (users) reported by our approach at that location. as shown in figure , the wifi based occupancy closely follows the ground truth manual count. the slight mis-match occurs for short wifi sessions when a user is present only for a brief period (and when their devices have not switched from the previous ap to the one on the room). the user counts are accurate for all sessions that exceed a few minutes since their devices eventually switch to the closest ap. together, these results validate the efficacy of using passive wifi sensing for location and proximity sensing for contact tracing. in this section, we describe case studies that evaluate the efficacy of our contact tracing tool and also present results on the efficiency of our graph algorithms and general limitations of our wifi sensing approach. we first describe our dataset and then our results. since our tool has been deployed on two university campuses, we use production wifi logs from the university wifi networks for our experimental evaluation. this is the same data that would be used by health professionals for their contact tracing, except that we use a fully anonymized version of this data for our experiments. table depicts the characteristics of the wifi logs. the us university has an aruba network of aps deployed across buildings. it has users comprising students and facultystaff (figures are rounded to the nearest thousand). the dataset spans jan to may , which includes the covid- lockdown that began during spring break (mid-march). the singapore university has a mixed aruba and cisco network comprising , aps deployed across buildings. it has , users comprising , students and , faculty/staff. the dataset spans feb to may and also includes the covid quarantine which was progressively announced by the government, ending with a full lockdown like the us university. we randomly choose a user from our dataset and assume they are infected with one of the above diseases and use our tool to compute the number of locations visited by the user over that period and the number of co-located users. we perform contact tracing assuming τ = ω = mins and τ = ω = mins, which implies location visited for at least (or ) minutes and co-location of at least (or ) minutes. for each disease, we repeat each contact tracing experiment for randomly selected students, and then randomly chosen faculty or staff users. figure depicts our results. as can be seen, the number of locations visited by an infected user grows as the duration of contact tracing grows from days for flu to days for measles. we find that the number of location visits is insensitive to τ beyond τ > mins (as discussed in more detail in the next section). a student visits ≈ locations per day while a faculty/staff user is somewhat less mobile and visits ≈ locations per day. figure depicts the proximity results from our contact tracing experiment. as shown, τ = min yields a large number of colocated user, colocated users for flu over a day period, rising to over users for measles over a day period for a student. for τ = min, the number of colocated users is lower-but still high - for flu and for measles. the colocation count is lower for facultystaff users (figure (b) ) but is still quite high ( to ) for τ = and substantially lower (between and ) for τ = mins. these results yield the following insights: • first, we note that the number of colocators does not increase linearly with an increase in contact trace duration. the growth is sublinear indicating that users have a social circle of users and there are repeated interactions with the same set of users over different days. • second, it is infeasible to manually contact trace several hundred users for each infected user. this can be addressed by carefully selecting the parameter τ and ω and also carefully considering the tool output for subsequent manual contact tracing. in particular, τ = min is too low due to a high rate of chance co-location. choosing τ = mins and ω be or mins may yield better results. further our results show that common areas like dining, cafeterias add substantially to the colocation counts. it is straightforward to filter out those ap sessions to determine users with higher risk. figure (a) and (b) shows that the number of co-locators drops substantially once cafeteria visits are excluded. finally, our report (see figure provide the total time spend with colocator in sorted order as well as the location where co-location occurred. it is possible to consider the top n (eg. n = ) users with the most proximity minutes or only consider specific locations such as a small conference room or a classroom for subsequent manual tracing. such strategies are already used by professional contact tracers to hone in on the most probable at-risk co-locators while eliminating users who may be false positives. tracing. while the above experiment involved a single level of contact tracing in many cases, contact tracing may have to be iterative, with each colocator subjected to contact tracing. given that a user may come in contact with more than a hundred users in a single day (eg if they attend a few lectures in the classroom and visit a cafeteria) iterative tracing even for two iterations can be prohibitive. as explained in the previous section, the colocators list needs to be pruned at each step to identify the users at most risk. in the previous section, we suggested using a carefully chosen τ and ω to filter out certain locations or focus on high-risk locations (eg a small conference room). these are subjective strategies and can yield errors and miss "true positives". an alternate strategy is to "test and trace", which combines testing with contact tracing -a strategy used by many countries for covid- . in this case, each colocated user is administered a test to check if they are infected, only infected users are subjected to iterative contact tracing and the rest are filtered out. in this case, the number of users subject to contact tracing grows based on the rate of transmission(referred to as the r in the medical literature). for example, if r= , then only out of the several tens of users identified by our tool will be subjected to additional tracing in each step (we assume that all users are tested to find r users who are infected). table depicts the number of users identified by this strategy for testing and tracing -as can be seen, the growth is much lower than a naive iterative strategy. tracing during quarantine periods. while the previous experiments performed contact tracing during pre-covid semester periods where mobility patterns were "normal", we now examine how contact tracing results will change in the presence of strict lockdown policies. figure (a) shows the number of locations visited per day by different types of campus users. while users visited - locations per day for τ = mins during the normal period, after march th , the number of ap locations visits drops sharply for all users due to lockdown policies. this will significantly alter contact tracing results fro our tool for users who become ill during such lockdown. figure (b) shows the number of locations visited for a user subjected to covid- contact tracing (duration of days). as shown the number of locations visited varies from to for τ = and it drops to - locations visits for τ = mins or greater. figure depicts the number of co-locators for τ = mins for several users based on the pre-covid and lockdown mobile patterns. as can be seen, social distancing and lockdown policies bring the colocator count to be less than ten for all types of users, an order of magnitude reduction. in such cases, comprehensive contract tracing of all colocators is feasible through manual means. to evaluate the efficiency of our graph algorithm, we compare the execution time of naive linear search approach and our graph based algorithm across varying size of co-locators. since different users display different amount of mobility, the number of co-locators seen for each user will be different. searching the co-locators using linear search requires complete scan of the entire dataset sequentially, resulting a high overhead across all runs irrespective of the number of observed co-locators of device. additionally, as the number of nodes increase, the search overhead also increases. in contrast, our graph algorithm efficiently identifies relevant edges and nodes relevant to the specified query, thereby reducing the search space overhead also, adding the constraint of τ results in further pruning of edges resulting in reduced search space reducing the time and space complexity of our algorithm. this behavior is depicted in figure that compares the execution overhead of the two approaches for our campus dataset. as shown, our graph-based implementation outperforms the naive sequential search by a significant margin. wifi-sensing has well-known limitations and this section analyzes the implications of these limitations on contact tracing. multi-device users : researchers have previously studied the behavior of multi-device users and shown that it is very common for users to own two or more devices [ ] . a key consequence of this result is that device count seen by an ap does not equal user count. while all wifi logs log device association information not all of them provide user ownership information. if such information is missing, radius authentication logs should be additionally used to map devices to owners to avoid double counting devices as separate users. figure shows the number of unique devices seen by aps in different types of campus buildings and the corresponding user count (eg aruba syslogs provide both types of information). as shown locations like dorms and classrooms see between . x to x difference in unique devices and unique users (since users may connect a phone and a laptop to the network), only dining areas (cafeteria) see low over counting since users are likely to carry only their phone when eating. this result highlights the importance of considering device ownership to avoid over counting users by only considering connected devices. unassociated devices : not all users may connect their mobile devices to the wifi network. such devices are visible to the network when they perform ssid scans using a randomized mac address. unassociated devices can cause multiple challenges. first ignoring them altogether will undercount users in a location but simply counting all devices can yield a large number of false positives. figure (a) depicts the number of unassociated devices seen in four buildings in our singapore campus. since the buildings are next to a public road or public bus stop, the number of unassociated devices per day is x greater than the number of associated users. figure (b) shows that enforcing a session duration of minutes filters out most of these chance associations and the number of such devices (likely visitors) is around % of the total number of associated devices. impact of session duration: our contact tracing tool uses two parameters τ and ω that are directly related to wifi session durations. judicious choice of these parameters can allow for a good tradeoff between eliminating false positives and eliminating true positives. figure shows the number of ap locations visited by campus users for varying values of session length τ . the figure shows that the location visits stabilize around τ = mins and then yields - location visits per day. small values of τ include locations visited when in transit and should be ignored. figure shows the impact of varying values of τ and ω and the figure shows a decreasing gradient as both τ and ω are increased for all user types. finally figure shows the number of colocated users for varying values of τ and ω. as shown, using values that are tens of minutes allows the tool to filter out overlapping sessions caused by users in transit. these results highlight the importance of carefully choosing τ and ω depending on the infectious nature of disease but also avoiding false positives. the prevalence of many infectious diseases in our society has increased the importance of contact tracing-the process of identifying people who may have come in contact with an infected person-for reducing its spread and disease containment [ , ] . for performing contact tracing, the infected user needs to provide the places visited and persons who were in proximity or direct contact [ ] . while the traditional method relies on interviews, the covid- pandemic has seen the use of a method such as gps, bluetooth [ ] , credit card records [ ] , and cellular locationing. manual contact tracing as a mode for containment of diseases with a high transmission rate has proved to be too slow and cannot be scaled. research [ , , ] has shown that technology-aided contact tracing can aid reduce the disease transmission rate by quicker scalable tracing and help achieve quicker disease suppression. bluetooth and bluetooth low energy (ble) based contact tracing has emerged as a possible method for proximity detection [ ] . a handful of systems based on bluetooth or ble have been rolled out few of which have been supported by the government of various countries such as singapore [ ] and australia [ ] . the main limitation of these approaches is the need for mass adoption before it becomes effective [ ] and its reliance on bluetooth distance measurements, which may not always be accurate. authenticity and privacy attacks are other key issues in using bluetooth for contact tracing. [ ] has shown that authenticity attacks can be easily performed on bluetooth based contact tracing apps. such attacks can result in forging the location visited and creating a fake history of a user introducing risk to the society as shown in [ ] . bluetooth apps suffer from privacy issues as noted in [ , ] . as a result, privacy issues for bluetooth-based contact tracing has received significant attention [ , , ] . privacy-preserving methods include the use of homomorphic encryption for determining contacts [ ] and the use of private messaging to notify possible contacts [ ] , to name a few. technology-aided contact tracing is becoming increasingly important tool for quick and accurate identification of co-locators. while bluetooth-based contact tracing method using phones have become popular recently, these approaches suffer from the need for a critical mass of adoption in order to be effective. in this paper, we presented a network-centric approach for contact tracing that relies on passive wifi sensing with no clientside involvement. our approach exploits wifi network logs gathered by enterprise networks for performance and security monitoring and utilizes it for reconstructing device trajectories for contact tracing. our approach is specifically designed to enhance the efficacy of traditional methods, rather than to supplant it with a new technology. we presented an efficient graph algorithm to scale our approach to large networks with tens of thousands of users. we implemented a full prototype of our system and deployed it on two large university campuses. we validate our approach and demonstrate its efficacy using case studies and detailed experiments using real-world wifi datasets. finally, we discussed the limitations and privacy concerns of our work and have made our source code available to other researchers under an open-source license. apple google partner covid- contact tracing singapore built a coronaviris app but it hasnt worked so far tracetogether app covid- contact tracing flusense: a contactless syndromic surveillance platform for influenza-like illness in hospital waiting areas epic: efficient privacy-preserving contact tracing for infection detection bluetrace: a privacy-preserving protocol for community-driven contact tracing across borders assessing disease exposure risk with location data: a proposal for cryptographic preservation of privacy contact tracing mobile apps for covid- : privacy considerations and related trade-offs afshan amin khan, and roohie naaz. . applicability of mobile contact tracing in fighting pandemic (covid- ): issues, challenges and solutions. cryptology eprint archive development finance division. . moef: korea contact tracing johns hopkins university covid dashboard contact tracing and disease control apps gone rogue: maintaining personal privacy in an epidemic epidemic contact tracing via communication traces quantifying sars-cov- transmission suggests epidemic control with digital contact tracing estimated influenza illnesses, medical visits, hospitalizations, and deaths averted by vaccination in the united states sars basics fact sheet quest: practical and oblivious mitigation strategies for covid- using wifi datasets feasibility of controlling covid- outbreaks by isolation of cases and contacts experiences & challenges with server-side wifi indoor localization using existing infrastructure extracting a mobility model from real user traces the effectiveness of contact tracing in emerging epidemics analysis of a campus-wide wireless network interrupting transmission of covid- : lessons from containment efforts in singapore wireless health monitoring using passive wifi sensing location determination using wifi fingerprinting versus wifi trilateration aruba networks. . it analytics for operational intelligence eryk dutkiewicz, symeon chatzinotas, and bjorn ottersten. . enabling and emerging technologies for social distancing: a comprehensive survey stefaan verhulst, and patrick vinck. . mobile phone data and covid- : missing an opportunity a collaborative multiyear, multimodel assessment of seasonal influenza forecasting in the united states covid- epidemic in switzerland: on the importance of testing, contact tracing and isolation a high-resolution human contact network for infectious disease transmission cisco systems. . cisco dna spaces qiang tang. . privacy-preserving contact tracing: current solutions and open questions empirical characterization of mobility of multi-device internet users how to return to normalcy: fast and comprehensive contact tracing of covid- through proximity sensing using mobile devices stressmon: scalable detection of perceived stress and depression using passive sensing of changes in work routines and group interactions analyzing shopperâĂŹs behavior through wifi signals sensorless sensing with wifi key: cord- - sos m authors: glenn, jeffrey; bluth, madeline; christianson, mannon; pressley, jaymie; taylor, austin; macfarlane, gregory s.; chaney, robert a. title: considering the potential health impacts of electric scooters: an analysis of user reported behaviors in provo, utah date: - - journal: int j environ res public health doi: . /ijerph sha: doc_id: cord_uid: sos m electric scooters (e-scooters) are an increasingly popular form of transportation in urban areas. while research on this topic has focused primarily on injuries, there are multiple mechanisms by which e-scooter share programs may impact health. the aim of this study is to explore the health-related behaviors of e-scooter users and to discuss their implications for public health. data were collected using an online survey emailed to registered e-scooter users. a total of users completed the survey. descriptive variable statistics and chi-squared analysis were performed to determine variable dependent relationships and equality of proportions. the most common destinations reported were “just riding around for fun”, home, and dining/shopping. the two most common modes of transportation that would have been used if e-scooters were not available were walking ( . %) and using a personal vehicle ( . %). riding behavior was equally mixed between on the street, on the sidewalk, and equal amounts of both. e-scooters in provo are likely having both positive (e.g., air pollution) and negative impacts on health (e.g., injuries, physical inactivity). future research should further explore patterns of e-scooter use and explicitly examine the linkages between e-scooters and areas of health beyond just injuries. there is growing awareness in academic and policy circles of the close linkages between health and urban transportation practices [ ] . stand-up electric scooters (e-scooters), two-wheeled vehicles with a small electric motor and a thin deck on which a single rider stands, are a relatively new micro-mobility option for urban areas and have the potential for both positive and negative health impacts [ ] [ ] [ ] . although research on the health impacts of e-scooters is sparse, the topic merits further exploration given the rapid increase in e-scooter popularity over the past three years in the united states and around the world [ ] [ ] [ ] [ ] . gaining a better understanding of the true positive and negative health impacts of e-scooters must start with more fully understanding e-scooter users and patterns of use [ ] . the potential health impacts of e-scooters depend on answers to questions related to user behaviors-e.g., substituting other forms of transit, commuting vs. recreational use, compliance with safety regulations. while some information exists to help answer these and other key questions, important knowledge gaps remain. the aim of this study is to explore the health-related behaviors of e-scooter users in provo, utah four months after an e-scooter share program was introduced. among the many evidence gaps that remain, this study focuses on four primary research questions: ( ) what motivations do users have for riding e-scooters?; ( ) what are the primary destinations of e-scooter users?; ( ) what alternative travel mode would riders be using if not riding an e-scooter?; ( ) to what degree are e-scooter users aware of and complying with safety regulations? ( ) what program or policy changes do e-scooter users believe would improve provo's e-scooter share program? based on this research, we identify opportunities for policy change that will facilitate positive health impacts of e-scooter use in provo and other cities. we also hope to encourage researchers and policymakers to seek a deeper understanding of patterns of use in diverse contexts as they consider the broad range of potential health impacts of e-scooters. [ , [ ] [ ] [ ] [ ] . in , users took . million trips on shared e-scooters in the united states [ ] . two of the largest e-scooter companies, bird and lime, were recently valued at over $ billion each [ , [ ] [ ] [ ] . multiple other companies, including ride share giants uber and lyft, have entered the competitive e-scooter market, which is predicted to become a $ billion industry by , although there is some evidence that the covid- pandemic has contributed to reduced ridership numbers in recent months [ ] [ ] [ ] . while there are variations between programs, in a typical e-scooter share arrangement a private company enters an agreement with local government officials to place e-scooters on city streets and make them available to rent for short periods of time [ , [ ] [ ] [ ] . potential users download a mobile phone application that allows them to view the locations of available e-scooters in real time and to begin, end, and pay for their rides. users are typically charged a flat fee for the rental plus an additional fee for each minute the e-scooter is used. users leave their e-scooters at their final destinations where the e-scooters then become available to other users. within municipal share programs, e-scooters typically have a range between and miles, and speeds are usually capped at miles per hour [ ] . e-scooters are appealing for a variety of reasons. for users, e-scooters offer a convenient, affordable, fun transportation option that serves as an alternative to motor vehicles, biking, and walking [ , ] . e-scooters are frequently used for both commuting and recreational purposes [ ] . for local governments, e-scooters represent a new form of transportation that can help bridge the "last mile" gap, a common obstacle for transit use, by connecting people with public transit nodes [ , , , ] . e-scooters are also seen as an environmentally friendly means for reducing traffic congestion in urban areas [ , ] . moreover, e-scooter programs may be appealing to local officials because government funds are not usually required to start or maintain them; rather, e-scooter companies pay fees that allow government agencies to make infrastructure improvements for e-scooter riders [ ] . e-scooters may even be a contributing factor to economic development because they facilitate easier access to businesses located in urban centers where parking is scarce and motor vehicle travel is more difficult. e-scooters have been warmly welcomed by some municipalities and shunned by others as state and local governments have struggled to enact appropriate regulations to manage the rapid expansion of e-scooter share programs [ , , [ ] [ ] [ ] . significant variation in e-scooter laws exists between states and cities-e.g., helmet use, sidewalk riding, hours of operation. [ , , , , , ] . since many state legislatures have not specifically addressed e-scooter usage, local governments have taken on the brunt of regulatory responsibility by attempting to manage e-scooter use with city ordinances [ ] . e-scooters create complicated liability issues in which municipalities may become liable for e-scooter injuries [ , , ] . there is a range of mechanisms through which e-scooters may affect health. in a recent evidence review, khreis et al. found multiple linkages between urban transport exposures or practices and adverse health impacts [ ] . while the research on e-scooters and health is limited, many of these linkages have been shown or theorized to apply to e-scooters discussed below. figure highlights these linkages and illustrates that they are shaped by available transport options and features of the built environment. there is a range of mechanisms through which e-scooters may affect health. in a recent evidence review, khreis et al. found multiple linkages between urban transport exposures or practices and adverse health impacts [ ] . while the research on e-scooters and health is limited, many of these linkages have been shown or theorized to apply to e-scooters discussed below. figure highlights these linkages and illustrates that they are shaped by available transport options and features of the built environment. a primary public health concern, and the focus of the vast majority of academic research on escooters to date, is e-scooter-related injuries [ ] . several studies in the united states and elsewhere have found a high incidence of injuries related to scooter usage, particularly head and limb trauma, after the introduction of e-scooter share programs [ , , , , [ ] [ ] [ ] . there is even some evidence that the injury rate for e-scooters may be higher than that of motorcycles and personal vehicles [ , ] . most injuries are due to falls or collisions with objects (not with motor vehicles) that occur due to poor road conditions or excessive speeds [ , , , , ] . there have also been reports of burns resulting from explosions of batteries [ , ] . in the united states at least nine known deaths have been linked with e-scooter use [ ] . various factors contribute to the prevalence of e-scooter injuries: incompatible infrastructure (e.g., lack of bike lanes), lack of directional tools on e-scooters (e.g., turn signals, headlights), rider inexperience and noncompliance with age restrictions, failure of users to obey traffic rules, alcohol use, and reluctance to wear helmets [ , , , , ] . recent studies have found helmet use among injured e-scooter riders to be extremely low, ranging from % and % in most studies [ , , , , , , , ] . additionally, despite regulations prohibiting them from doing so, e-scooter users commonly ride and park on sidewalks, which can lead to injuries to users as well as to pedestrians [ , ] . one study found that % of collisions occurred on sidewalks where riding was prohibited, and others have found that approximately % of all e-scooter related injuries involve pedestrians [ , , , , , , , ] . vulnerable populations such as the elderly, hearing impaired, and young children have an increased risk for sidewalk-related injuries [ ] . a sizeable proportion of escooter injuries among users involve children under , despite most rental company agreements prohibiting ridership for minors [ ] . many of these hazards relate to cultural norms and limited regulation that may minimize users' perception of potential dangers and therefore lead to unsafe behaviors [ ] . a primary public health concern, and the focus of the vast majority of academic research on e-scooters to date, is e-scooter-related injuries [ ] . several studies in the united states and elsewhere have found a high incidence of injuries related to scooter usage, particularly head and limb trauma, after the introduction of e-scooter share programs [ , , , , [ ] [ ] [ ] . there is even some evidence that the injury rate for e-scooters may be higher than that of motorcycles and personal vehicles [ , ] . most injuries are due to falls or collisions with objects (not with motor vehicles) that occur due to poor road conditions or excessive speeds [ , , , , ] . there have also been reports of burns resulting from explosions of batteries [ , ] . in the united states at least nine known deaths have been linked with e-scooter use [ ] . various factors contribute to the prevalence of e-scooter injuries: incompatible infrastructure (e.g., lack of bike lanes), lack of directional tools on e-scooters (e.g., turn signals, headlights), rider inexperience and noncompliance with age restrictions, failure of users to obey traffic rules, alcohol use, and reluctance to wear helmets [ , , , , ] . recent studies have found helmet use among injured e-scooter riders to be extremely low, ranging from % and % in most studies [ , , , , , , , ] . additionally, despite regulations prohibiting them from doing so, e-scooter users commonly ride and park on sidewalks, which can lead to injuries to users as well as to pedestrians [ , ] . one study found that % of collisions occurred on sidewalks where riding was prohibited, and others have found that approximately % of all e-scooter related injuries involve pedestrians [ , , , , , , , ] . vulnerable populations such as the elderly, hearing impaired, and young children have an increased risk for sidewalk-related injuries [ ] . a sizeable proportion of e-scooter injuries among users involve children under , despite most rental company agreements prohibiting ridership for minors [ ] . many of these hazards relate to cultural norms and limited regulation that may minimize users' perception of potential dangers and therefore lead to unsafe behaviors [ ] . in providing an electric alternative to motor vehicles, e-scooters are typically perceived as an environmentally friendly form of transportation that could lead to lower vehicle emissions and cleaner air in cities where they are being used [ ] . since air pollution is responsible for premature morbidity and mortality from a number of diseases-including, for example, respiratory infections, cardiovascular disease, and premature birth-the potential positive impact of e-scooter use on health is significant [ , ] . exposure to noise from motor vehicle engines, which has been linked to increased incidence of ischemic heart disease, cognitive impairment among children, and sleep disturbance, is also inversely correlated with e-scooter use since the battery-operated engines are essentially silent [ , ] . although e-scooters are commonly seen as a green alternative to gasoline-powered motor vehicles, they present a number of environmental concerns-including greenhouse gas emissions, particulate matter formation, and use of mineral and fossil resources-that often go overlooked [ ] [ ] [ ] [ ] . findings from recent studies suggest that, overall, e-scooters have a more negative life cycle impact on the environment than the transportation modes they are replacing [ , ] . one study found that e-scooters' impact on climate change is better than that of personal automobiles but worse than that of buses with higher ridership or electric bicycles [ ] . with an average lifespan of about years before often ending up in landfills, e-scooters' high level of disposability is a key driver of negative environmental impacts, which means that this impact will likely lessen as technology improves, a goal e-scooter companies are actively workings towards [ , , ] . another major issue is the vehicles used to collect e-scooters each day for charging and relocating. one study found that the % of the emissions attributable to e-scooters stem from collection vehicles [ ] . more sparsely populated areas likely necessitate higher collection miles driven and thus the e-scooters will likely lead to more air pollution than in densely populated urban areas. another environmental concern is the greenhouse gas emissions required to manufacture and assemble e-scooters [ ] . this process includes the extraction of raw materials, including aluminum and lithium, for the e-scooter frames and batteries. very little research exists on the linkage between e-scooters and physical activity. insufficient physical activity is responsible for over million deaths each year as a key risk factor for multiple chronic diseases [ ] . because the act of riding e-scooters in itself likely offers few physical activity benefits, some health researchers have expressed concerns that e-scooters will replace active forms of transportation such as walking and cycling [ ] [ ] [ ] . on the other hand, some advocates have observed a positive association between the increase in e-scooters and more active transportation as cities seeking to accommodate e-scooters have improved infrastructure that indirectly creates an environment and culture more conducive to cycling and walking [ , ] . some e-scooter companies have argued that e-scooters offer a low-intensity workout that can help users increase core strength and exercise their legs, in addition to acting as a "gateway activity" to further exercise [ ] . while these specific claims have yet to be confirmed through research, some preliminary conclusions that e-scooters offer the potential for at least minor physical activity benefits may be drawn from the literature about the positive health benefits of standing compared to sitting [ ] [ ] [ ] . critical questions remain regarding the effects of e-scooters share programs on social life in communities in which they operate. community health and social interaction, which are influenced by neighborhood design and transport infrastructure, have a significant impact on mental health and well-being of community members [ , , ] . community severance occurs where transportation acts as a physical or psychological barrier that separates built-up areas or open spaces [ ] . there are reasons to believe e-scooters may increase community connectedness by improving access to transit, recreation facilities, and other public spaces where social interaction occurs. on the other hand, e-scooters may also contribute to community severance, for example by increasing risks of pedestrian injuries or by acting as a visual symbol of disorder in urban neighborhoods due to erratic placement of e-scooters after use [ , ] . there are reports of frustrated city residents vandalizing e-scooters and even celebrating their actions by posting evidence of that vandalism on social media [ ] . even if e-scooters do not represent a new barrier to community connectedness, the benefits of e-scooter access may not be available equitably to people of lower socioeconomic statuses while any negative health impacts may disproportionately affect these same people, which could exacerbate existing inequalities [ ] . this study looked at e-scooter rider behavior in provo, ut, a city of , people [ ] located approximately miles south of salt lake city, ut. the close proximity of two large universities within or near provo city limits has contributed to a high concentration of residents and traffic. provo's mayor was primarily interested in the e-scooter program to improve the air quality of the city by providing zero emission alternatives to driving [ ] . in partnership with the company zagster, an e-scooter share program was introduced in provo in august . the geographic area principally targeted in the e-scooter program lies between downtown provo and brigham young university (byu), where there is a high concentration of college-aged residents, relatively dense commercial and educational land use, and a new bus rapid transit (brt) line. byu does not permit e-scooters on campus. figure shows the geospatial distribution of e-scooter rides observed in october . at the time of the survey, total e-scooters were available on city streets. between august and december , over , rides were taken on provo's e-scooters [ ] . provo city code . . prohibits e-scooter use on sidewalks [ ] helmet use is not required but is strongly encouraged. while utah state law prohibits people under eight years old from riding an e-scooter, zagster policy requires users to be eighteen or older to rent an e-scooter. even celebrating their actions by posting evidence of that vandalism on social media [ ] . even if escooters do not represent a new barrier to community connectedness, the benefits of e-scooter access may not be available equitably to people of lower socioeconomic statuses while any negative health impacts may disproportionately affect these same people, which could exacerbate existing inequalities [ ] . this study looked at e-scooter rider behavior in provo, ut, a city of , people [ ] located approximately miles south of salt lake city, ut. the close proximity of two large universities within or near provo city limits has contributed to a high concentration of residents and traffic. provo's mayor was primarily interested in the e-scooter program to improve the air quality of the city by providing zero emission alternatives to driving [ ] . in partnership with the company zagster, an e-scooter share program was introduced in provo in august . the geographic area principally targeted in the e-scooter program lies between downtown provo and brigham young university (byu), where there is a high concentration of college-aged residents, relatively dense commercial and educational land use, and a new bus rapid transit (brt) line. byu does not permit e-scooters on campus. figure shows the geospatial distribution of e-scooter rides observed in october . at the time of the survey, total e-scooters were available on city streets. between august and december , over , rides were taken on provo's e-scooters [ ] . provo city code . . prohibits e-scooter use on sidewalks [ ] helmet use is not required but is strongly encouraged. while utah state law prohibits people under eight years old from riding an escooter, zagster policy requires users to be eighteen or older to rent an e-scooter. this study was a cross sectional study designed to address the primary research questions. data were collected using a -item online questionnaire; three were demographic questions and questions were about riding history, behavior, and knowledge (see appendix a for full question list). demographics included city residence, age, and gender. riding behavior questions included trip origin, trip destination, trip motivation, and street versus sidewalk riding on users' most recent e-scooter trip. open-ended responses were solicited related to changes that would enable street versus sidewalk riding and to e-scooter staging. the survey was emailed the week of september to all registered zagster users (~ , ) in provo city. a total of users completed the survey, for a response rate of . %. all research procedures were performed in compliance with relevant laws and institutional guidelines. participant demographic characteristics (age, gender, and place of residence) were first calculated, and descriptive variable statistics were then conducted for each item in the questionnaire. after verifying statistical assumptions, chi-squared analyses were performed to determine variable dependent relationships and equality of proportions between demographic characteristics and motivations for riding, destinations, travel mode alternatives, and safety behaviors. quantitative analysis was performed using the r statistical software (r foundation for statistical computing, vienna, austria) [ ] . two researchers used nvivo qualitative data analysis software to thematically code responses to the open-ended survey questions [ ] . these coded responses were then analyzed collectively by the full research team to identify the most prominent emergent themes. the majority of respondents were - years old ( . %), and % were under years old. more men than women completed the survey ( % vs. %). roughly % of participants were residents of utah county (provo city- . %, utah county- . %) ( table ) . the most frequently mentioned reason for riding e-scooters was "to have fun" ( . %) followed by "to save time" ( . %) (see table ). though "having fun" was the top reason for riding e-scooters for both men and women, significantly more women ( . %) reported riding for this reason compared to men ( . %) (χ = . , df = , p < . ). similarly, men were more likely to ride "to avoid parking hassles" ( . %) compared to women ( . %) (χ = . , df = , p = . ). college-aged (ca) persons aged - years old comprise the largest portion of e-scooter ridership in provo city ( . %). while more non-ca persons ( . %) than ca persons ( . %) reported a motivation for riding e-scooters was to have fun (χ = . , df = , p < . ), more ca than non-ca persons reported a motivation was to save time ( . % compared to . %; χ = . , df= , p < . ). the most common destinations to which e-scooters are reportedly being ridden are "just riding around for fun" ( . %), home ( . %), and dinning/shopping locations ( . %) (see table , a full table is presented in appendix a). there were no statistical gender differences with respect to destination with the exception of school; men tended to ride to school more ( . %) than women ( . %) (χ = . , df = , p = . ). the destinations of ca persons were different in many instances compared with non-college-aged persons. ca persons were less likely to use the e-scooter when dining out/shopping (χ = . , df = , p = . ; ca = % vs. non-ca = %); to just ride around for fun (χ = . , df = , p < . ; ca = . % vs. ca = . %); and, to work (χ = . , df = , p = . ; ca = . % vs. non-ca = . %). conversely, they were more likely to ride home (χ = . , df = , p = . ; ca = % vs. non-ca = %); to school (χ = . , df = , p < . ; ca = . % vs. non-ca = . %); and to social gatherings (χ = . , df = , p = . ; ca = . % vs. non-ca = . %). the two most common modes of transportation that would have been used if e-scooters were not available were walking ( . %) and using a personal vehicle ( . %) (see table ). the only statistical difference by gender was for bicycling, where men were more likely to use a bicycle if an e-scooter were unavailable ( . % vs. . %) (χ = . , df = , p = . ). similar to trip destination, there were significant differences between the - -year-old ca and non-ca group. as an alternative to e-scooters, ca persons were less likely to have used a bicycle (χ = . riding behavior was equally mixed between on the street (n = , . %), on the sidewalk (n = , . %), and equal amounts of both (n = , . %). sidewalk and street riding was associated with gender in that men were more likely to ride on the street (χ = . , df = , p < . ) and women were more likely to ride on the sidewalk (χ = . , df = , p = . ). there was no difference between genders who reported to ride equally on the street and sidewalk. likewise, ca persons were less likely to ride on the street ( . % vs. . %; χ = . , df = , p < . ) and more likely to ride on the sidewalk ( . % vs. . %; χ = . , df = , p < . ). there was no difference by age for those who equally rode between sidewalk and street. the majority of respondents did not know that it is illegal, according to provo city code, to ride e-scooters on the sidewalk (n = , . %). there were no differences between genders but there were by age. college-aged persons were less likely to know about the sidewalk riding code ( . % vs. . %; χ = . , df = , p = . ). when asked what program changes would make them ride on the street rather than the sidewalk, some participants ( %) reported that they would have ridden in the street if they had known that it was acceptable to do so and/or they could be sure drivers were aware that they were allowed to do so. overwhelmingly, most of the respondents ( %) asked for the addition of bike lanes and/or better constructed bike lanes throughout provo. there were very few mentions of where the bike lanes should be added. most respondents use the adjectives "good", "wider", "improved", "clearly marked", and "painted" when describing what was meant by better bike lanes. the next highest response called for better roads; % of respondents said that provo streets had potholes, narrow roads, bumpy streets, and a lack of lane divisions. reckless drivers and curbside parking-which blocks bike lanes, takes up room on the shoulders, and pushes scooter riders further into the center of the road-were cited as deterrents to riding off the sidewalk. when asked if there was anything else users wanted to mention about the scooters, many of them ( %) simply stated that they enjoyed having scooters in the area. this study sought to consider the relationship between e-scooters and health by gaining a better understanding of e-scooter users and their behaviors in provo, ut. while e-scooters may affect health in the various ways, whether they have a net positive or negative impact on health depends largely on why and how people are riding them. two-thirds of users who responded to the survey were men, and over half were - years old. this age range is similar to what we would expect given the age demographics of provo where the median age is . and % of the population is between ages - [ ] . while injury data from provo has not yet been reported, previous studies in other cities found that the majority of e-scooter injuries were among male millennials, the same demographic group who make up the majority of e-scooter users in provo [ , , , ] . user compliance with safety regulations is another important health-related factor addressed by these data. despite the zagster policy requiring renters to be at least years of age, % of all respondents were under , which raises concerns about user safety and the ability of zagster (and other private e-scooter companies) to enforce its rider policies. additionally, only . % of users reported complying with local law and riding exclusively on the street while the rest reported riding at least partially on the sidewalk. while data from other cities is limited, a pilot e-scooter program in portland, or found that the proportion of sidewalk riders varied greatly depending on street design- % rode on sidewalks with a mph speed limit compared to % with a mph speed limit, and % rode on sidewalks if a protected bike lane existed compared to % where there were no bike facilities [ ] . riding on sidewalks is overall more dangerous for users and much more likely to lead to pedestrian injuries as have been found in previous studies [ , , ] . the finding that women were more likely than men to avoid on-street riding is consistent with research on gender differences in cycling behavior that finds safety perception is a major factor [ , ] . the majority of users ( . %) reported being unaware that e-scooters were not permitted on sidewalks, which represents a higher proportion of riders compared to the % of users in rosslyn, va who were not familiar with e-scooter laws concerning sidewalk riding [ ] . while the difference between cities likely reflects that the e-scooter program in rosslyn had been active for a longer period of time, the lack of knowledge around laws suggests that better educating users may be a first step in reducing unsafe riding behavior. nearly % of users in provo mentioned in their open-ended responses that additional and improved bike lanes would make it easier for them to ride on the street, which highlights another opportunity, albeit one requiring a greater financial investment from the city, to create a safer environment for e-scooter users. the data show that e-scooter users in provo choose to ride for a variety of reasons. the top reason given was to have fun ( . %) and the top destination reported by users was "just riding around for fun" ( . %). however, a sizeable number of users also report riding e-scooters to commute to work ( . %) or school ( . %) and for other purposes such as dining/shopping ( . %) and traveling to social gatherings ( %). these numbers are similar to those in portland, or where . % of riders used e-scooters for recreation or exercise while % used them to get to a destination [ ] . convenience appears to be an important motivator as the second and third top reasons given for riding e-scooters in provo were to save time ( . %) and to avoid parking hassles ( . %). interestingly, ca users were more likely to rent e-scooters to save time than to have fun whereas non-ca users reported the opposite. ca users were also more likely to ride to school and social gatherings while non-ca users were more likely to ride to dine out/shop or commute to work. our findings suggest that age is more influential on trip destination as opposed to gender. a very small percentage of riders ( . %) reported their destination as a public transit stop, which may indicate that e-scooters in provo are not necessarily delivering on the promise of solving the "last mile" problem, although because the questionnaire asked specifically about the most recent trip it is likely that some of the other riders were connecting from public transit. this is key question that should be addressed through a different survey design in future research. given the variety of motivations cited for riding e-scooters, a critical question in terms of health implications is: for which alternative modes of travel are e-scooters being substituted? the most common response, given by . % of users, was that they would have walked if an e-scooter had not been available; among ca riders this percentage increased to . %. additionally, % of users reported riding e-scooters instead of bicycling. similarly, in portland, or % and % of e-scooter riders, respectively, would have walked or biked instead of using an e-scooter, and in raleigh, nc % of riders would have walked or biked [ , ] . the most likely impact of these findings is an overall reduction in physical activity levels because e-scooters are replacing more active forms of transportation. while this may be cause for concern in terms of health, on the other hand, . % of e-scooter users reported that they would have used a personal vehicle or rideshare service (i.e., uber, taxi) if an e-scooter had not been available. this number is comparable to data from rosslyn, va ( %), raleigh, nc ( %), and portland, or ( %) [ , , ] . (in portland, % of users even reported getting rid of a personal vehicle due to e-scooter availability [ ] .) these rides represent fewer cars on the road and, in all likelihood, an overall reduction in local air pollution and associated poor health. additionally, the survey does not capture the possibility that the respondent would have chosen a different destination entirely were an e-scooter not available. given that e-scooters are best designed for short trips in urban areas, it is possible that the avoided motor vehicle trips would have been longer than their e-scooter substitutes. this finding is particularly relevant for provo city, a place with problematic winter air pollution and whose primary motivation for introducing e-scooters was to provide a green alternative to motor vehicles; yet, considering disposability issues and emissions due to collecting and placement of e-scooters, important questions remain about the full environmental impact and its implications for health. based on the findings of this study, there are several policy change strategies that could help optimize the heath impacts of e-scooter share programs in provo and in other cities. first, to reduce the probability of injuries, more training and strategically placed educational information (e.g., signs posted in high traffic areas) should be provided to increase users' knowledge about safety precautions (e.g., avoiding sidewalks, safe parking) and users' e-scooter operating skills. considering the shared road space, information should also be provided to help drivers, cyclists, and pedestrians be more aware of e-scooter riders. second, as evidenced by ridership among children, there is an enforcement gap in zagster's ability to enforce safety policies. similarly, although this study did not explicitly explore helmet use, informal observation on the streets of provo suggest that helmet use is extremely rare among e-scooter users, which is consistent with other studies [ , , ] . to improve safety, cities should work with private e-scooter companies to identify ways, which may include the passing of additional local ordinances, to identify violations and enforce policies. third, for e-scooters to experience long term success it is clear that bike lines and other infrastructure must continue to improve. when asked about possible improvements that would encourage them to ride on the street, the vast majority stated designated lanes would be most helpful (n = ). enhanced education and training alone will likely be ineffective without a more conducive riding environment, which should be a priority for city decision makers concerned with improving safety fourth, while a sizeable proportion of users are substituting e-scooters for personal vehicles, there are still negative environmental impacts that should be considered and minimized. zagster recently introduced a new, more durable model of e-scooter to provo city streets, and city policymakers should continue to push for e-scooters that have longer durability. they should also work with zagster to ensure low-emissions vehicles are used for collecting and placing e-scooters, and that the routes driven for these tasks are as short as possible. finally, cities should consult regularly with community members-those who use e-scooters and those who do not-to understand the impacts of e-scooters on community severance and social interactions, particularly among marginalized populations. this study makes a significant contribution to the literature by applying an existing health impact framework and proposing a range of linkages between health and e-scooters, a rapidly emerging public health issue for which previous studies have focused almost exclusively on injuries. the study also reports data from a relatively large sample of e-scooter users on their self-reported behavior, which is scarce in the academic literature, that serve as starting point for understanding how population health may be impacted by e-scooters. these data led to concrete, valuable recommendations for policymakers in provo and other places, especially mid-sized cities, which are currently grappling with instituting appropriate policy responses for the variety of issues that come with e-scooter programs. some critical limitations should be noted. first and foremost, while this study considered linkages between e-scooters and health, its findings do not directly address the health impact of e-scooters on provo residents. even where it adds value in terms of providing a clearer picture of why and how e-scooters are being used, it omits key demographic variables (e.g., race/ethnicity, income, etc.) that are essential for understanding some of the connections between e-scooters and health (e.g., community severance). finally, the survey used is potentially problematic because it represents a single point in time shortly after e-scooters were introduced in provo. it also asks only about users' most recent trip and it relies on responses from a small, non-representative sample of registered zagster users, which may be a source of bias if those who elected to respond to the survey have different patterns of behavior than those who did not respond. the use of an online survey may also lead to bias by selecting for younger, internet-using adults; however, this is not a major concern because these respondents are also the most likely to be e-scooter users. e-scooters are a nascent public health issue that positively and negatively affect health in a number of ways, including through injuries, air pollution levels, physical activity levels, and community severance. to understand the full impact of e-scooters on health we need to gain a thorough understanding of e-scooter users and their patterns of use. this study found that in provo, ut e-scooter users are predominantly male, college-aged individuals who ride e-scooters for a variety of reasons, the top being for recreational purposes. most users were unaware of laws prohibiting e-scooters from sidewalk riding, which led to two-thirds of users riding at least part of the time on sidewalks. about half of users would be walking or riding bicycles if e-scooters were not an option, while about one-third would be driving a personal vehicle. thus, e-scooters in provo are likely having both positive and negative impacts on health. future research, perhaps in the form of a health impact assessment, should be designed explicitly to examine the linkages between e-scooters and areas of health beyond just injuries, e.g., by focusing on community severance among marginalized communities or on users' physical activity levels. research is also needed to evaluate the impact of policies and interventions designed to reduce e-scooter related injuries. thoughtful, evidence-based implementation of e-scooter programs is critical to ensuring a future net positive benefit to public and community health. the authors declare no conflict of interest. on your last trip, why did you choose to ride a scooter? . on your last trip, what mode of transportation would you have taken had a scooter not been available? . on your last trip, where did you ride to? . on your last trip, where did you ride from? . on your last trip, where did you primarily ride? . if you rode on sidewalks, why did you choose to do so? . did you know that provo city code . . prohibits riding on sidewalks? (don't worry, we won't tell on you.) . what changes would make you want to ride in the street instead of the sidewalk? . where should scooters be staged in the morning that they aren't currently? . anything else you'd like to tell us? . what city do you live in? . what is your age? . what is your gender? health impacts of urban transport policy measures: a guidance note for practice are e-scooters polluters? the environmental impacts of shared dockless electric scooters understanding spatio-temporal heterogeneity of bike-sharing and scooter-sharing mobility electric scooters: batteries in the battle against ambient air pollution? lancet planet injuries associated with standing electric scooter use the e-merging e-pidemic of e-scooters. trauma surg behavior of electric scooter operators in naturalistic environments electric scooters: why bird, lime, skip, and spin are taking over cities. vox analysis of e-scooter trips and their temporal usage patterns electric scooters: case reports indicate a growing public health concern the integration of electric scooters: useful technology or public health problem? the electric scooter: a surging new mode of transportation that comes with risk to riders electric scooters are going worldwide are shared electric scooters going extinct? lime's valuation reportedly tanks bird raises $ million at a $ . billion valuation lime vp on company's meteoric rise to $ billion valuation; venturebeat electric scooter-sharing grinds to a halt in response to the covid- pandemic. the verge trends in e-scooter litigation: an update on the still-evolving body of law. bench bar minn are electric scooters promoted on social media with safety in mind? a case study on bird's instagram injury from electric scooters in copenhagen: a retrospective cohort study craniofacial injuries seen with the introduction of bicycle-share electric scooters in an urban setting can e-scooters solve the 'last mile' problem? they'll need to avoid the fate of dockless bikes. conversat israel trauma group. the casualties from electric bike and motorized scooter road accidents early experience with electric scooter injuries requiring neurosurgical evaluation in district of columbia: a case series craniofacial injuries related to motorized scooter use: a rising epidemic the case for electric scooters invasion of the electric scooter: can our cities cope? guard sharing the sidewalk: a case of e-scooter related pedestrian injury shared micromoblity policy toolkit: docked and dockless bike and scooter sharing emergency department visits for electric scooter-related injuries after introduction of an urban rental program impact of electric scooters to a tertiary emergency department: -week review after implementation of a scooter share scheme new peril on our roads: a retrospective study of electric scooter-related injuries electric scooters were to blame for at least injuries and deaths in the us last year epidemiology of dockless electric rental scooter injuries received cases of e-scooter fires in ntd control and health system strengthening pedestrians and e-scooters: an initial look at e-scooter parking and perceptions by riders and non-riders where do riders park dockless, shared electric scooters? findings from e-scooter findings report transport for health: the global burden of disease from motorized road transport lessons from the streets of paris. conversat electric scooters aren't quite as climate-friendly as we thought. the verge it's dockless scooters! but can these electric-powered mobility options be considered sustainable using life-cycle analysis? dockless e-scooter: a green solution for mobility? comparative case study between dockless e-scooters, displaced transport, and personal e-scooters global, regional, and national comparative risk assessment of behavioural, environmental and occupational, and metabolic risks or clusters of risks in countries, - : a systematic analysis for the global burden of disease study electric scooters on collision course with pedestrians and lawmakers. conversat our position on e-scooters local bike advocates: e-scooters are game-changing. streets blog usa the nexus does an electric scooter keep you fit? replacing sitting time with standing or stepping: associations with cardio-metabolic risk biomarkers continuous dose-response association between sedentary time and risk for cardiovascular disease a meta-analysis sitting less and moving more: improved glycaemic control for type diabetes prevention and management the social and distributional impacts of transport: a literature review rethinking the links between social exclusion and transport disadvantage through the lens of social capital as electric scooters proliferate, so do minor injuries and blocked sidewalks. npr fed-up locals are setting electric scooters on fire and burying them at sea provo launches new motorized scooter program. dly. her ready to roll: shareable electric scooters return to utah county frequently asked questions regarding scooters r: a language and environment for statistical computing nvivo (version ) motorized scooter injuries in the era of scooter-shares: a review of the national electronic surveillance system gender differences in recreational and transport cycling: a cross-sectional mixed-methods comparison of cycling patterns, motivators, and constraints explaining gender difference in bicycling behavior this article is an open access article distributed under the terms and conditions of the creative commons attribution (cc by) license key: cord- -hu raqyi authors: finazzi, francesco; fassò, alessandro title: the impact of the covid‐ pandemic on italian mobility date: - - journal: signif (oxf) doi: . / - . sha: doc_id: cord_uid: hu raqyi francesco finazzi and alessandro fassò use location data collected by an earthquake‐monitoring app to gauge compliance with lockdown measures in italy the impact of the covid- pandemic on italian mobility francesco finazzi and alessandro fassò use location data collected by an earthquake-monitoring app to gauge compliance with lockdown measures in italy make some assessment of the public's compliance with mobility restrictions during the period of maximum growth of infections and hospitalisations. we have done this using a smartphone application originally designed to monitor, detect, and alert users to nearby earthquakes. the app was previously discussed in significance back in . it forms part of a project called "earthquake network" (sismo.app). members of the public are invited to download the app and, once installed on a smartphone, the app serves two purposes: it uses data from a phone's accelerometers to provide real-time seismic monitoring and, when a seismic event is detected, the app uses a phone's location data to alert users who are in or near the vicinity of an event. in order to provide real-time detection and alerts, the app collects phone location figure mobility in italy estimated through smartphone data collected by the earthquake network project. the orange line represents the percentage of users who have not moved for hours. the blue line represents the average daily distance travelled in kilometres. confidence intervals obtained using the bootstrap technique. on the horizontal axis, saturdays and sundays are shown in red. data approximately once every minutes. the location data is sent anonymously to the processing server, which is responsible for identifying the seismic event thanks to a statistical approach. although the data is anonymous, each user has a unique identifier. it is therefore possible to track the movements of each smartphone/user hours a day. all of this takes place in compliance with privacy and the general data protection regulation, allowing the user to delete their data from the server if required. for our analysis of movement under the coronavirus lockdown, we used location data for the period from march to april , based on a sample of about , italian app users. the daily trajectory of each user was analysed in order to evaluate the average distance travelled each day by users and the percentage of users who had not moved for hours. the task was made more difficult by the fact that the reported location of smartphones is affected by uncertainty (ranging from a few meters to a few kilometres) and by the fact that a smartphone may be subject to "ghost" movements, due to the increase in uncertainty about its position rather than to any real movement. however, techniques such as the kalman filter allow us to estimate a trajectory faithful to the true trajectory travelled by the smartphone and to understand which smartphones actually moved. figure shows, for each date, the average distance travelled by users (blue line) and the percentage of users who had not moved within a -hour period (orange line). we refer to this latter group as "% #istayathome", in reference to the twitter hashtag widely used by people tweeting in support of the lockdown. pandemics and exponential growth james j. cochran explains why a misunderstanding or disregard of exponential growth may have extremely grave consequences d uring his march call into the sean hannity show on fox news, president donald trump questioned whether new york state would actually need the tens of thousands of ventilators its leaders had estimated would be necessary to deal with its expected number of coronavirus cases (bit.ly/ bw ayz). then, three days later, during a briefing at the white house, trump wondered out loud why the need for protective masks had increased at one new york hospital from , - , per week to , - , . "where are the masks going?" he asked (bit.ly/ ypnv ). "are they going out the back door?" he later added: "we do have a problem of hoarding. we have some health care workers, some hospitals, frankly -individual hospitals and hospital chains -we have them hoarding equipment, including ventilators." this presidential dismissal of the magnitude of these numbers may be indicative of a lack of understanding or disregard of exponential growth that plagues a large portion of the population. even many who are well educated do not understand the concept, and often use the term "exponential growth" or "exponentially" as hyperbole rather than as a description of a trend in growth or acceleration (nyti.ms/ ygv vc). why should we care about this seemingly arcane mathematical principle? because, under our current circumstances, misunderstanding or disregard of exponential growth and the decisions made based on this misunderstanding or disregard may have extremely grave consequences. albert a. bartlett ( bartlett ( - , who was professor emeritus in nuclear physics at university of colorado at boulder, flatly stated (bit.ly/ bxo j): "the greatest shortcoming of the human race is our inability to understand the exponential function." three months ago, i would have agreed with bartlett's general message, but i also would have thought he was exaggerating its importance. however, the impact this lack of understanding is having during the coronavirus pandemic has quickly brought me in line with bartlett's position on the importance of this issue. to explain exponential growth, let's make a deal. i will clean your home, flat, or apartment every day in july ( days) if you pay me one penny ( it is worth noting that the app data come from a self-selecting sample, rather than a random sample, and that, typically, the earthquake network app is not used by children or older people. hence, we think that the "true" population figures for average distance travelled and percentage staying at home could show an even steeper trend. if the coronavirus pandemic persists or occurs cyclically, large-scale monitoring of the population and of the risk of contagion is likely to be adopted. in this context, it will be useful to have a statistical methodology for modelling the mobility of individuals at the personal level and the interaction between them, as well as having dedicated apps for receiving alerts in case of increased personal risk. n how a smartphone network detects earthquakes in real time a statistical approach to crowdsourced smartphone-based earthquake early warning systems il cambiamento degli stili di vita e l'impatto della pandemia di covid- sulla qualità dell'aria [the change of lifestyles and the impact of covid- key: cord- - bvshhtn authors: ng, pai chet; spachos, petros; plataniotis, konstantinos title: covid- and your smartphone: ble-based smart contact tracing date: - - journal: nan doi: nan sha: doc_id: cord_uid: bvshhtn contact tracing is of paramount importance when it comes to preventing the spreading of infectious diseases. contact tracing is usually performed manually by authorized personnel. manual contact tracing is an inefficient, error-prone, time-consuming process of limited utility to the population at large as those in close contact with infected individuals are informed hours, if not days, later. this paper introduces an alternative way to manual contact tracing. the proposed smart contact tracing (sct) system utilizes the smartphone's bluetooth low energy (ble) signals and machine learning classifier to accurately and quickly determined the contact profile. sct's contribution is two-fold: a) classification of the user's contact as high/low-risk using precise proximity sensing, and b) user anonymity using a privacy-preserving communications protocol. sct leverages ble's non-connectable advertising feature to broadcast a signature packet when the user is in the public space. both broadcasted and observed signatures are stored in the user's smartphone and they are only uploaded to a secure signature database when a user is confirmed by public health authorities to be infected. using received signal strength (rss) each smartphone estimates its distance from other user's phones and issues real-time alerts when social distancing rules are violated. the paper includes extensive experimentation utilizing real-life smartphone positions and a comparative evaluation of five machine learning classifiers. reported results indicate that a decision tree classifier outperforms other states of the art classification methods in terms of accuracy. lastly, to facilitate research in this area, and to contribute to the timely development of advanced solutions the entire data set of six experiments with about , data points is made publicly available. abstract-contact tracing is of paramount importance when it comes to preventing the spreading of infectious diseases. contact tracing is usually performed manually by authorized personnel. manual contact tracing is an inefficient, error-prone, time-consuming process of limited utility to the population at large as those in close contact with infected individuals are informed hours, if not days, later. this paper introduces an alternative way to manual contact tracing. the proposed smart contact tracing (sct) system utilizes the smartphones bluetooth low energy (ble) signals and machine learning classifier to accurately and quickly determined the contact profile. scts contribution is two-fold: a) classification of the users contact as high/low-risk using precise proximity sensing, and b) user anonymity using a privacy-preserving communications protocol. sct leverages bles non-connectable advertising feature to broadcast a signature packet when the user is in the public space. both broadcasted and observed signatures are stored in the users smartphone and they are only uploaded to a secure signature database when a user is confirmed by public health authorities to be infected. using received signal strength (rss) each smartphone estimates its distance from other users phones and issues real-time alerts when social distancing rules are violated. the paper includes extensive experimentation utilizing real-life smartphone positions and a comparative evaluation of five machine learning classifiers. reported results indicate that a decision tree classifier outperforms other states of the art classification methods in terms of accuracy. lastly, to facilitate research in this area, and to contribute to the timely development of advanced solutions the entire data set of six experiments with about , data points is made publicly available. index terms-bluetooth low energy, smartphone, covid- , physical distancing, proximity, contact tracing c ontact tracing is an important step in containing a disease outbreak [ ] , [ ] . many efforts have been devoted to tracing a list of contacts when a person is diagnosed with a highly infectious disease, such as covid- . the current contact tracing method, which requires a collaborative effort from several authorized personnel, is labor-intensive and timeconsuming [ ] . since it takes time to trace the contact, the group of users who have been in contact with an infected individual might spread the disease to another group of people before they get informed. it is critical to have an effective contact tracing method that not only can automatically inform the potential users immediately but also reducing the required amount of labor force [ ] . to this end, a smart contact tracing (sct) system is introduced by exploiting the bluetooth low energy (ble) signals on smartphones. ble is ubiquitous and is readily available on many smartphones making it ideal for the introduced system [ ] . on the other hand, smartphones have become an intimate device in our everyday life. while we might leave the smartphone away from us when we are in our private space (e.g., home, private office, etc.), we always carry the smartphone when we do the grocery shopping, commute on public transport, walk along the open street, etc. in this way, smartphones are the best choice for contact tracing, in which the tracing is only performed when the user is in the public space. an overall illustration of our introduced sct system is shown in fig. . at any time, no location or any other information regarding the users is collected or transmitted. the application uses only ble signals and no information exchange. the system has three main objectives: preserveprivacy, provide accurate contact tracing, and provide real-time proximity alerts. preserve-privacy. we leverage the beaconing feature in ble wireless technology to broadcast an encrypted packet periodically [ ] . this encrypted packet is broadcast on the nonconnectable advertising channels (i.e., ch , , and ). hence, our proposed sct system can prevent unauthorized access to the user's smartphone. furthermore, the packet encrypts a piece of unique signature information based on the ambient environmental features the smartphone encountered at a particular time. this signature is unique and is almost impossible to be duplicated by another device on another arxiv: . v [cs.lg] may occasion. all the broadcast signatures and observed signatures will be stored in the local storage. the user is only required to upload their own broadcast signatures to the signature database when the user is confirmed to be infected with the contagious disease. otherwise, the signature store in the local storage will be deleted automatically when it is expired. we define the signature expiration according to the disease spreading time window, as suggested by the health authorities. by comparing the signature of each smartphone, a list of possible contacts can be retrieved without explicitly revealing the sensitive information of the infected user. accurate contact tracing. the smartphone application identifies contacts in proximity, over time. it records the estimated distance and the duration of interaction between individuals. in this way, it will identify when someone has been too close to an infected person for too long (the too close for too long (tc tl) problem). for instance, when people hug each other they are too close for a short period of time, while inside the cabin of a flight, people can be in ten meters distance for too long, breathing the same air. at the same time, a distance of two meters in a classroom might be safe while three meters in a subway train might trigger an alert. we have to rely heavily on the virologists and the epidemiologists to identify the healthy distance in different environments, and through this application, we can give them access to these crucial details. real-time proximity alerts. the application will provide a real-time alert to the user if the physical distancing between any two users is not maintained. this will be achieved by detecting the proximity between any users in a given location, including the grocery store, public transit, etc. this proximity information can be retrieved by inspecting the rss patterns from the user's smartphone [ ] . a smartphone can measure the rss value upon seeing the packet broadcast by the nearby smartphone. since rss is inversely proportional to the square of the distance [ ] , we can use it to estimate the distance between any two smartphones and then classify the proximity based on the recommended physical distancing rule. precise distance estimation through the rss is necessary to determine the proximity between any two smartphones. however, rss is subject to severe fluctuation especially the body shadowing effect since the smartphone is carried by users [ ] . we examine five machine learning-based classifiers: decision tree (dt), linear discriminant analysis (lda), naive bayes (nb), k nearest neighbors (knn), and support vector machine (svm), over six different smartphone positions: hand-to-hand (hh), hand-to-pocket (hp), hand-to-backpack (hb), pocketto-backpack (pb), pocket-to-pocket (pp), and backpack-to-backpack (bb). in summary, this paper has the following contributions: • privacy-preserving signature protocol: our sct system provides a secure contact tracing by using the nonconnectable advertising channels and an encrypted packet containing unique signature information based on the ambient environmental features observed by a smartphone. • proximity sensing and real-time physical distance alert, with precise distance estimation: we classify the proximity of a user to any user by estimating the distance between any two users based on the rss values measured by each smartphone, while we push a notification to alert the users when anyone violates the physical distancing rule. after approximately s of interaction between smartphones, the system is able to provide a reliable estimation. dl is the most accurate classifier. • smartphone implementation and effects of smartphone's position: we prototyped our system design and implemented the application in modern smartphones to demonstrate the feasibility of our proposed sct. the energy requirements of the application are negligible. we compared the classifiers in terms of their estimation accuracy, while we examine six different positioning sets of smartphones. when the users have their smartphones in similar positions, the classifiers can improve accuracy. • extensive experiments: we performed extensive experiments in a real-world setting verify the effectiveness of our sct. all the collected data is available in the ieee dataport [ ] and github [ ] . the overall dataset contains the measurement data obtained from six experimental sets, amounting to a total of , data points. we believe that the dataset will serve as an invaluable resource for researchers in this field, accelerating the development of contact tracing applications. contact tracing aims to track down a group of users who have encountered with an infected individual. the goal is to inform this group of users regarding the potential risk that they might face so that they can take appropriate actions as recommended by the local health authority. contact tracing could be a viable solution in resuming the normal lifestyle while preventing the further virus outbreak. an illustration of the differences in having a contact tracing system is shown in fig. . in practice, a person will be quarantined immediately when they are confirmed to be infected with the disease. however, those people who have been in close contact with the infected individual are still free to move without realizing that they may have already got infected and became a virus carrier. with contact tracing, we can inform most of the potential close contact so that they can take appropriate action to isolate themselves from the crowd. recognizing the importance of contact tracing, many countries have put effort to develop a smartphone-based contact tracing system. • china: in china, a close contact detector based on qr codes technology is implemented [ ] . the application is developed based on a surveillance strategy in monitoring the people's movement within the country and it can push an alert to users if they have been in close contact with the infected individual. • south korea: in south korea, the location data (i.e., the gps data) obtained from the user's smartphone are used to detect the distance of the users from the infected individual. the tracker application will push a notification that contains the personal details of the infected individuals to the potential users who have been in contact with the infected individual [ ] . • singapore: in singapore, a privacy-preserving approach is adopted, by using the ble signal on the smartphone to detect the proximity between any two individuals. the tracetogether application broadcasts an encrypted packet, which is generated by a secret key distributed by the ministry of health, given the phone number [ ] . the application will also alert the users when they are in close contact with the infected individual. the first two approaches might compromise user's privacy since their applications require to monitor the user's mobility and locations. on the other hand, in the third approach, although it preserves privacy by tracking only the proximity between users without explicit location information, the encryption process involves the user's phone number. hence, the phone number might be retrieved by a malicious hacker. besides the national-level effort, there are collaborations in industry and academia in delivering an effective contact tracing solution while preserving user privacy [ ] , [ ] . rather than using location data, many of these initiatives focus on the use of ble signals for proximity detection. for instance, pan european privacy-preserving proximity tracing (pepp-pt) detects the proximity based on the broadcast ble packet containing a full anonymous id [ ] . covid- watch, on the other hand, can automatically alert the user when they are in contact with the infected individual [ ] . similarly, the privacy-preserving automated contact tracing (pact) exploits the ble signals in combination with secure encryption to detect possible contacts while protecting privacy [ ] . most of the above initiatives assume that the ble signals will work for proximity detection without considering the smartphone positioning effects on it. to the best of our knowledge, there is a lack of a comprehensive study on the accuracy of ble signals for contact tracing proximity sensing. furthermore, most of the encryption methods are based on information provided by the user, which might be subject to possible information leaks if the encryption method is compromised. to bridge the gap, this paper studies the proximity sensing with the ble signals broadcast from the smartphones carried by the user while designing a privacypreserving signature protocol that uses the environmental feature instead of user information for packet broadcasting. six experimental sets, with different smartphone positioning are examined, to investigate the feasibility of the system under different realistic conditions. ble provides a short-range communication over the . ghz ism band [ ] . it is ubiquitous and has been adopted by many smart devices (e.g., smartwatches, earphones, smart thermostats, etc.) as the main communication platform [ ] . furthermore, ble is readily available in most modern smartphones regardless of the operating system. there are two modes of communication available with ble: ) nonconnectable advertising, and ) connectable advertising [ ] . the latter advertising mode allows another device to request a connection by sending a connect req packet on the advertising channels. in this work, we focus on the nonconnectable advertising mode, in which the device cannot accept any incoming connection requests. this feature is useful for our sct system in ensuring no neighboring devices can access the smartphone to retrieve sensitive information. for contact tracing purposes, we configure the smartphone to periodically broadcast the advertising packet via the nonconnectable advertising mode. these packets can be heard and received by any nearby smartphones as long as these smartphones are within the broadcast range. these smartphones can also measure the received signal strength (rss) upon receiving the packet. however, there are two major challenges: ) the length of the advertising packet is only up to bytes, and ) the rss values are subject to severe fluctuation. ) advertising packet: in the non-connectable advertising mode, the smartphone will broadcast the advertising packet over the three advertising channels periodically according to the system-defined advertising interval, t a . the advertising interval defines how frequent a packet is broadcast. for example, if t a = ms, we shall expect to see at least packets per second. the advertising packet can take up to bytes, as shown in fig. . note that bytes are used for preamble ( byte), access address ( bytes), header ( bytes), mac address ( bytes), and crc ( bytes only bytes left to put in the information related to the environmental signature. this poses a question on how to construct a unique yet useful signature that can be encapsulated into this bytes payload. ) received signal strength (rss): following the inverse square law [ ] , the rss is inversely proportional to the square of the distance. let p r denotes the signal strength in dbm, then: where d is the distance between any two devices and n is the path loss exponent, and its value is subject to the environmental setting when the measurement is taken. as shown in fig. , different environments have different effects on the rss variation even though the distance between any two devices in these environments are the same. hence, we need to take the environmental factor into consideration when applying the path loss model to estimate the distance given the rss. section v provides a further discussion on our distance estimation approach that addresses the above problem. the intimacy of smartphones in our everyday life motivates us to adopt the smartphone for contact tracing purposes. however, there are privacy concerns about using such an intimate device for contact tracing [ ] . many users might worry that their sensitive information which resides in the smartphone will be exposed to the public during the contact tracing. our introduced sct uses the non-connectable advertising mode, hence, none of the neighboring devices are able to connect to the user's device to retrieve any information. furthermore, we are using a unique environmental signature that contains no information about the user's identity. research efforts that tried to address this privacy issue provide better encryption methodologies [ ] , [ ] . however, none of the works discuss the contact tracing in private and public locations. while most users might willingly to participate in contact tracing in the public locations in a hope to flatten the disease spreading curve, they might feel a bit uncomfortable to let the contact tracing application running when they are having their private time in the private location (e.g., home, sleeping room, car, etc.). future work can be conducted using the embedded sensors on the smartphone to check if the user is in the private or public location. then, we can use this information to turn on and turn off the contact tracing application accordingly. observe signatur local databases: observed signatures iii. proposed smart contact tracing system there are two major phases with our sct system, as shown in fig. : the interaction phase and the tracing phase. the interaction phase focuses on the following two main components: ) privacy-preserving signature protocol, and ) precise proximity sensing; whereas the tracing phase aims to provide an efficient signature matching. the interaction phase involves the day to day activities in public locations, such as workplaces, public transports, grocery stores, outdoor parks, etc. the contact tracing application starts automatically when it detects the user is in the public location. the application executes the following functions: i. signature generation: the smartphone scans for the ambient environmental features. these features are selectively processed to generate a unique signature that can be fit into the bytes advertising payload. the signature will be updated every few minutes. ii. signature broadcasting: the smartphone broadcasts the advertising packet containing the unique signature periodically according to the advertising interval of t a . the packet is broadcasted through the non-connectable advertising channels. iii. signatures observation: the smartphone scans the three advertising channels to listen for the advertising packet broadcast by the neighboring smartphones. the scanning is performed in between the broadcasting event. iv. proximity sensing: the smartphone measures the rss values and uses them to estimate how close it is to the neighboring smartphones. it is assumed to be in proximity when the distance is less than m. v. physical distancing alert: the smartphone triggers a realtime alert to warn the user to keep a healthy distance from nearby users when it detects any physical distancing violation. all the generated signatures and observed signatures will be stored inside the user's local storage, as shown in fig. . since the signature does not contain any information about the owner, there is no way for the user to trace or identify the original owner of the observed signatures. furthermore, the signatures are deleted from the local storage permanently once it is expired. we define the expiration period for each signature based on the virus spreading timeframe recommended by the health authorities. for instance, for covid- the expiration period should be days from the day the signature was recorded. after days, the corresponding signature will be deleted. if a user is diagnosed with an infectious disease, they can upload all the signatures to the signature database. in fig. , user a uploaded all his signatures to the signature database after he became an infected individual. the database will distribute the signature to all the users' smartphones. the signature matching computation is taken placed on each individual smartphone and a local alert is triggered when there is a match. the local alert means that the alert is triggered by the smartphone's program itself, not the centralized alert sent by the server. the server is only used to distribute the data. no program/code is executed on the server to find the close contact. in this way, we can protect the user from revealing their identity and to ensure that none of the match cases can be eavesdropped by malicious hackers. besides signature matching, the application also performs the classification to classify the potential risk of a user according to the time and distance the user spent with the infected individual. while the smartphone can use the non-connectable advertising mode to refuse any incoming connection request attempt, we also need to ensure that the packet broadcasting will not reveal one's identity. several methods have been suggested to protect the user's identity by using an encrypted packet [ ] , [ ] . however, these methods require a random generated secret key that can be compromised. for instance, in tracetogether application, a secret key is used to encode the phone number of the user. if the secret key is hacked, then the phone number can be retrieved. in this work, we propose to use the ambient environmental signal to construct a signature vector that can fit into the advertising packet. when the application starts the contact tracing, it first generates a signature that can fit into the bytes advertising packet. the signature is a transformed vector containing the ambient environmental features. when the smartphone scans for the packet broadcast by the nearby smartphones, it may also see other ble devices. for example, in a grocery store, the smartphone might see the ble beacon attached to the promotional item, the ble signal from a smartwatch, apple pencil, smart thermostat, smart lighting control, etc. the signal strength of each of these devices observed by a user's smartphone is changing depending on the location of the user. furthermore, some of these devices (i.e., smartwatch, apple pencil) might not always remain at the same location. let p r (d) be a function that returns the time average rss value measured from a ble device located at a certain distance from smartphone and b = {b , b , . . . , b m } be a set of ble devices excluding the smartphone used for contact tracing, then the observed vector can be expressed as follows: where o u (t) ∈ r m is an m-dimensional vector observed by a smartphone of user u at time t. the length of the vector m is dependent on the size of b. rather than truncating the vector when m > or filling the vector with some arbitrary values when m < , we define a dictionary Ψ ∈ r ×m to transform the m-dimensional vector to a -dimensional vector. this dictionary is also known as the secret transformation key to the observed vector. we can define the dictionary as follows: by multiplying the dictionary with the observed vector, we obtain our unique signature vector. where Ψ j = (ψ ,j , ψ ,j , . . . , ψ ,j ) t ∈ r is the j-th column vector from the dictionary. the dictionary for each smartphone is different. if any two smartphones observe the same ambient features, i.e., similar time average rss values from a similar set of ble devices, the generated signature vector is still different. note that the above case is very rare because even though both persons appear at the same location at the same time, the rss values from the same set of ble devices might be different due to the receiver sensitivity of the smartphone, the smartphone's antenna, the position and orientation of the smartphone, etc. upon the generation of signature, the smartphone encapsulates this signature information into its advertising packet and broadcasts the packet through the non-connectable advertising channels. the advertising interval determines the broadcasting signature generation signature generation signature generation own signatures observed signatures fig. : timing diagram for the advertising, scanning and signature generation activities. all the generated and observed signatures will be logged in the local database, together with a timestamp τ . frequency. suppose that t a = ms, then we should expect about packets per second. however, this also depends on the scanning window and the interval of a smartphone. more precisely, the smartphone can only see the packet when its scanning activity overlaps with the advertising activity. the timing diagram for the advertising, scanning, and signature generation activities is shown in fig. . each activity is triggered periodically according to their interval, i.e., generation interval t g , advertising interval t a , and scanning interval t s . given t s , the smartphone will only stay active to listen for the incoming packet for a duration defined by the scanning window t w . as shown in fig. , smartphone a fails to receive s b (t ) from smartphone b, for the first two times since there is no scanning activity in smartphone a when s b (t ) arrived. however, smartphone a manages to receive s b (t ) when smartphone b broadcasts the same packet the third time. according to [ ] , the likelihood to see the advertising packet broadcast by neighboring smartphones is high as long as t a < t s . intuitively, when the broadcasting frequency is higher than the scanning frequency given the scanning window is sufficiently long, then it is likely for one of the broadcast packets from a meets the scanning windows of b. we can use a continuous scanning (i.e., set t w = t s ) to increase the packet receiving rate. however, such a scanning approach can greatly affect the energy consumption of a smartphone and eventually create an adverse effect on a user's experience. this work mainly focuses on the privacy and preciseness of contact tracing, balancing the energy consumption and packet receiving rate is a possible direction for future work. the smartphone logs the generated signatures and the observed signatures in its own local storage, as shown in fig. . a timestamp τ is logged when the smartphone saves a signature into the local database. we could use either a sql or nosql approach to construct this database. the logged timestamp is useful to examine the total time two persons spend in close proximity to each other. note that for the observed signature, we also log the rss value upon receiving the packet. this rss value provides useful information for proximity sensing. proximity sensing has been employed in many scenarios, including identification of the user's proximity to museum collection [ ] , and gallery art pieces [ ] . there are also works study the proximity detection in dense environment [ ] , or proximity accuracy with filtering technique [ ] . however, most of these works study the proximity detection between a human and an object attached to a ble beacon [ ] . there is no work studying the proximity sensing between the devices carried by humans. we use rss to infer the distance between any two smartphones [ ] . given the rss value, the distance of a smartphone to another smartphone that broadcasts the packet can be estimated as follows: where n is the path loss exponent and c is the constant coefficient. both n and c can be obtained through least square fitting. given the distance, then we can determine if the user follows the safe physical distance as recommended by the health authorities. an alert is sent to remind the user if they violate the physical distance rule. in the distance estimation context, accuracy indicates how close an estimated value to the true value. in other words, the error between the estimated value and true value is close to zero for an accurate estimation. precision, on the other hand, tells if any two estimated values fall into the same region given similar measurement input (i.e., the rss value). for contact tracing purposes, an accurate distance estimation is not that critical as compared to precise proximity sensing. we do not need an accurate estimation to tell if the user is in proximity to the infected individual. rather, a precise estimation is more critical in determining the risk of a user. in particular, we consider that a user belongs to the high-risk group when the user is in close proximity (i.e., d ≤ m) with the infected individual, otherwise, the user is considered to be in the lowrisk group. the problem of classifying the risk of a potential contact can be modeled as a binary hypothesis test. in particular, consider a risk mapping function r : (d) −→ {+ , − }, where + indicates high-risk and − low-risk, then there are three hypotheses including the null hypothesis, since we need to consider also the case of false positive and false negative. false positive is also known as false alarm, in which a user actually belongs to the low-risk group, but the system wrongly classifies them to the high-risk group. false negative, on the other hand, wrongly classifies the user as the low-risk group, but they are actually in close proximity to the infected individual. let h + denote the hypothesis that the user belongs to the high-risk (+ ) group and h − the hypothesis that the user belongs to the low-risk (− ), then the possible hypotheses are: where r(d) = means that the user is not in contact with the infected individual. this is valid when the signature matching returns null, which means the user did not encounter any infected individual. let h, l, and a be the ground truth label for high-risk, low-risk and absence (i.e., the user is not in contact with the infected individual, then: the possible classification outcomes given the above hypotheses are illustrated in fig. . it is obvious that miss detection is undesirable because the user might be at risk but the system considers the user is safe. false negative misclassified the high-risk user to low-risk, but in comparison to miss detection, it can at least detect the user. however, this may give a wrong impression to the user that the possibility for them to get infected is low, but actually the possibility could be high. false positive misclassified the low-risk user to high-risk. even though it is a bit conservative to alarm the user that they are most likely to get infected while they may not, this is a relatively safer outcome than miss detection and false negative. we have developed a smartphone application to demonstrate our proposed sct. the application has the following two functions: ) contact tracing based on the privacy-preserving signature protocol, and ) physical distancing alert based on precise proximity sensing. we describe our experimental setup and then discuss our experimental results by comparing the performance with another five classifiers, i.e., decision tree (dt), linear discriminant analysis (lda), naive bayes (nb), k nearest neighbors (knn), and support vector machine (svm). fig. : for the experimental purpose, we created another version of the application by (a) adding a manual button to control the start and the end of the experiment, (b) an input field to key in the truth distance measured through the measuring tape, and (c) a save button that save the measurement data. we built an android application to demonstrate the functionalities of our proposed sct. first, the application generates a signature packet according to the privacy-preserving signature protocol. then, it pushes an alert notification when the user violates the physical distancing rule. all the generated signatures, observed signatures, and their corresponding signal strengths are all stored in the smartphone's storage. we installed the application into android smartphones, including nokia . with android , htc m with android . nougat. at least api is required for the ble to operate. according to google, at least % of smartphones support api . if a user has a lower api version, then they can still use the application, however, only to receive ble signals. other iot devices, such as ble beacons can be used from these users to enable them to transmit signals [ ] . when running the application the power requirement is less than . w which is negligible. for experimental purposes, we created another version of the application that allows us to log the ground truth during the experiment, in order to be able and compare the estimation with the real data. for experimental purposes, the application that allows us to log the ground truth was used. we use the ground truth to evaluate the classification performance. the following information is logged: the truth distance, name of smartphone, mac address of ble chipset, the packet payload, rss values, time elapsed, and timestamp. the time elapsed indicates the time difference between the previous broadcast packet and the current broadcast packet, whereas the timestamp is the exact time when the smartphone received the packet. the true distance is measured with a measuring tape during the experiment, as shown in fig. . in this experiment, both users are required to hold the smartphone on their hand while doing the measurement. however, the position of the hand is not fixed, and the user can randomly hold the smartphone according to their own comfort. the android ble api only provides three possible advertising interval settings, as shown in table i . in our experiment, the application is configured to advertise at a "advertise mode low latency". the application is initiated to start the scan when the user presses the scan button. the scan will continue until the user press the button again. we repeated the experiment for distance from . m to . m (with . m increment each step), and m to m (with m increment each step). hence, there are a total of distance points where the measurement is conducted. for each distance, the application was running for at least s. all the measurement data are saved into a "comma-separated values" (.csv) file format and are exported to matlab for further experiments. there are a total of , data points collected. the statistical descriptions of our experimental data are shown in table ii . we can see that the variance is high at some distance points. this is mostly due to the multipath effects, in which the signal takes multiple paths to reach the receiver. at the receiving end, the signals might be added up constructively or destructively, resulting in many different rss measurements even the distance is the same. furthermore, the reflection by moving objects in practical environments might cause undesirable outliers. to mitigate the possible outlier, we apply a moving average filter to the raw data. a comparison between the raw rss data with the filtered data, for d ={ . , , , , } m is shown in fig. . it is clear that the filtered data provides a much smoother signal. however, there are still variations in signal strength even though the distance is the same. in practice, it is hard to obtain only line-of-sight (los) signal in indoor environments due to the reflection and diffraction of the signal. extracting multipath profile features to achieve better distance estimation could be a possible future direction. we applied a non-linear least square fitting to determine the value for coefficients n and c, in eq. ( ). we used the mean rss described in table ii as the dependent variable and distance as the independent variable. using this fitted path loss model, we can then estimate the distance by feeding the rss value measured at each time step into the model. the ultimate goal is to classify if the user belongs to the high-risk or low-risk group assuming the other user is infected with the disease. based on the physical distancing rule recommended by the canada health authority [ ] , we classify the user as high-risk if the estimated distance is ≤ m and low-risk if the estimated distance is > m. we compared the fitted path loss model with five machine learning-based classifiers: dt, lda, nb, knn, and svm. we separated the measurement data into % training data and % testing data. the input rss was encoded into an -bit binary feature. we used confusion matrix to evaluate the performance of each classifier, as shown in fig. . overall, dt method achieves the highest accuracy, i.e., . %. however, if we examined the matrix, we can see that dt also produces a very high false negative rate, i.e., it incorrectly classifies the high-risk group as low-risk for . %. this is not a desirable result for contact tracing purposes because those in the high-risk group are those people that are very likely to get infected but dt method classified them as low-risk. on the other hand, both nb and knn have a higher false positive rate, i.e., . %. this is relatively acceptable as it is rather to be more conservative than to be ignored. while the overall results are acceptable, there is still room for improvement. instead of using the raw measurement, we compared the results with preprocess data. furthermore, we would like to understand how the distancing threshold affects accuracy. ) implications of filtered window: as shown in fig. , we can mitigate the possible outliers by preprocessing the data. note that fig. is obtained by applying the moving average with window size equals to . we further examine the effect of window size on the risk classification performance, and the results are shown in fig. the window size increases. however, nb does not show any performance gain with increased window size. lda, on the other hand, starts to show fluctuation when the window size increases. this could be due to overfitting during the training process. overall, we see that the performance starts to saturate when the window size is more than . the performance gain with respect to window size is shown in fig. . the performance gain is computed by subtracting the accuracy obtained by the filtered rss with the accuracy obtained by the raw rss and then divided by the accuracy obtained by the raw rss. from fig. , we can see that pl is the one that has benefited from the filtered rss, in which it has higher performance gain than the rest. in particular, we see that the accuracy obtained via pl is increased from . % to . %, which is corresponding to . % performance gain. however, not all the methods are benefited from the filtered rss. some methods show a performance drop when the window size increases, for example, svm. even though dt and lda can achieve better performance than pl model, both of these methods require extensive training and the accuracy may drop when there are not sufficient data for training. ) implications of physical distancing threshold: as discussed previously, our sct system classifies the contact to high-risk or low-risk according to the physical distancing rule recommended by the health authority. while we might incorrectly classify the high-risk contact to low-risk due to the fluctuation of the rss value, we observed that the classification accuracy, in fact, increases when the distancing threshold is smaller. this is preferable as we would definitely like to classify the user as high-risk when the user is very close to the infected person. we plotted the accuracy score for different distance thresholds for pl, dt, and lda. these three methods are selected because they show a good performance, as discussed previously. we compared the accuracy score between raw data and filtered data, by setting the window size to . this window size is selected based on the window effect on the accuracy score discussed previously. the accuracy is high when the distancing threshold is less than m, as shown in fig. fig. : the effect of distance thresholds (i.e., the distance rule used to classify the high-risk and low-risk contact) on the accuracy. system might produce some false negative, but this is mostly happening to the group of users with distance in between m to m from the infected individual. interestingly, we also see that the accuracy increases when the distance threshold is increased from . m to m. this indicates that the system is a bit conservative, in which it tends to classify the user as high-risk when the distancing threshold is more than . m. overall, it is safer to have high false positive than high false negative especially if the virus is very contagious. we extended the experiment to investigate the proximity sensing performance in connection to the positions of smartphone on the body. the reason is that the user might not carry the smartphone on their hand most of the time. when they are walking on the street or doing grocery shopping, the user might either carry the phone on their hand, put their phone on their pocket or backpack. as shown in fig. , we consider additional five cases on top of the "hand-to-hand (hh)" case discussed previously. all the measurement data collected from all these six cases can be found in ieee dataport [ ] and github [ ] . there are a total of , data points, in which hh contributes , data points. the additional five cases and their total data points are listed as follows: • hand-to-pocket (hp): one user carries the phone on his/her hand and another user keeps the phone in their pocket. there are a total of , data points collected for this case. previously, we have verified that the accuracy in distance estimation did affect the classification performance. hence, we examine the distance estimation performance for all the five cases above. in particular, we used the mean absolute error (mae) to compute the error between the estimated distance and the ground truth distance. the cdf of the distance estimation errors for all the six cases is plotted, as shown in fig. . it is obvious that the filtered data achieves a better performance. the window size is selected based on the justification provided in fig. . from the cdf, we can see that, for % of the time, the error is less than . m for hh case, . m hp case, . m hb case, . m pb case, . m pp case, and . m bb case. we observe that pb has the worst performance. this can be explained by the fact that the signals from both smartphones were suffered through different paths of attenuation. hence, even though we tried to calibrate the model based on the environmental factor, the model is unable to capture such variations. for hh, pp, and bb cases, the signals from both sides might suffer similar path of attenuation. take the bb case for example, the smartphone observed a signal blocked by a human body since the smartphone on the other side was located inside the backpack. similarly, the smartphone on the other side also observed similar signal blocked by another human body. hence, as long as both smartphones measure the signal from similar position, it is most likely to produce a good estimation. we also examine the classification performances for all these six cases. table iii shows the classification accuracy obtained using the raw data and the filtered data. from the table, we can see that dt achieves the best performance with more than % accuracy for all the cases. compared to the classification based on distance estimation, the machine learning based is more robust to the signal variations caused by body shadowing. rather than estimating the distance, these classifiers tried to memorize the output given the labeled input during the training process. hence, the amount of data and the validity of data during the training is very important to train a good classifier. in general, the machine learning approach can be adopted when there are sufficient training data available, otherwise, the pl model is the best choice for instant proximity sensing. the accuracy of the pl model increases when the time duration users spent in contact increases, as shown in fig. . in general, when the duration increases, the smartphone will be able to observe more signals that help to produce better distance estimation and increase the classification accuracy. however, the accuracy starts to saturate after s for most of the cases except hb case. this result indicates that the smartphone has already observed sufficient rss data for making the most accurate distance estimation when the time duration is at least s. the accuracy of the hb case, on the other hand, drops when the time duration increases. since the signals arrive at two smartphones from different attenuation paths, the more signals the smartphone observes, the more confusing the smartphone in making a correct estimation. note that both hb and pb cases converge to the same accuracy score when the time duration increases. it is clear that the varying attenuation paths due to the position of smartphones on different body positions can severely affect accuracy. future work can be conducted to study the effect of attenuation paths from both sides and come up with an adaptive path loss model that can cater to such diversity in attenuation paths. vii. conclusions contact tracing is an essential measure in containing the further spread of a highly infected disease. we propose a smart contact tracing (sct) system that can provide precise proximity sensing and classify the risk of encountered contact while providing a privacy-preserving signature protocol. from the experimental results, we verified that a ble-based system for contact tracing is a prominent solution for epidemic control and prevention. our sct system offers tangible results of using rss values for proximity sensing between two human beings. we have also shared the dataset in an open-source repository to encourage further research. . % ( . %, . %) . % ( . %, . %) lda . % ( . %, . %) . % ( . %, . %) nb . % ( . %, . %) . % ( . %, . %) knn . % ( . %, . %) pb dt . % ( . %, . %) . % ( . %, . %) lda . % ( . %, . %) . % ( . %, . %) nb . % ( . %, . %) . % ( . %, . %) knn . % ( . %, . %) pp dt . % ( . %, . %) . % ( . %, . %) lda . % ( . %, . %) . % ( . %, . %) nb . % ( . %, . %) . % ( . %, . %) knn . % ( . %, . %) bb dt . % ( . %, . %) . % ( . %, . %) lda next generation technology for epidemic prevention and control: data-driven contact tracking contact tracing and disease control five things we need to do to make contact tracing really work quantifying sars-cov- transmission suggests epidemic control with digital contact tracing smartphones and ble services: empirical insights ble beacons for internet of things applications: survey, challenges, and opportunities improved distance estimation with ble beacon using kalman filter and svm rss localization using unknown statistical path loss exponent model body shadowing and furniture effects for accuracy improvement of indoor wave propagation models china launches coronavirus 'close contact detector' app coronavirus mobile apps are surging in popularity in south korea privacy guidelines for contact tracing applications tracesecure: towards privacy preserving contact tracing pan-european privacy-preserving proximity tracing we put the power to reduce the spread of covid- in the palm of your hand pact: private automated contact tracing overview and evaluation of bluetooth low energy: an emerging low-power wireless technology bluetooth: a viable solution for iot? secure seamless bluetooth low energy connection migration for unmodified iot devices rssi-based localization for wireless sensor networks with a mobile beacon projecting the transmission dynamics of sars-cov- through the postpandemic period epic: efficient privacypreserving contact tracing for infection detection modeling neighbor discovery in bluetooth low energy networks ble beacons for indoor positioning at an interactive iot-based smart museum notify-and-interact: a beacon-smartphone interaction for user engagement in galleries high resolution beacon-based proximity detection for dense deployment improving ble beacon proximity estimation accuracy through bayesian filtering a compressive sensing approach to detect the proximity between smartphones and ble beacons face-to-face proximity estimationusing bluetooth on smartphones ble beacons in the smart city: applications, challenges, and research opportunities physical distancing: how to slow the spread of covid- key: cord- -vsoc v authors: jiang, helen; senge, erwen title: usable security for ml systems in mental health: a framework date: - - journal: nan doi: nan sha: doc_id: cord_uid: vsoc v while the applications and demands of machine learning (ml) systems in mental health are growing, there is little discussion nor consensus regarding a uniquely challenging aspect: building security methods and requirements into these ml systems, and keep the ml system usable for end-users. this question of usable security is very important, because the lack of consideration in either security or usability would hinder large-scale user adoption and active usage of ml systems in mental health applications. in this short paper, we introduce a framework of four pillars, and a set of desired properties which can be used to systematically guide and evaluate security-related designs, implementations, and deployments of ml systems for mental health. we aim to weave together threads from different domains, incorporate existing views, and propose new principles and requirements, in an effort to lay out a clear framework where criteria and expectations are established, and are used to make security mechanisms usable for end-users of those ml systems in mental health. together with this framework, we present several concrete scenarios where different usable security cases and profiles in ml-systems in mental health applications are examined and evaluated. with a mental health crisis looming large and many ml systems being built for mental health use cases, it is challenging to trace, analyze, and compare all the designs and implementations of such systems. so far, there is a lack of well-defined framework that describes properties relating to the security of such ml systems in mental health, and even less considerations are given to how such security mechanisms can be usable for those systems' end users. however, without usable security, undiscovered, undisclosed, and ill-considered limitations and properties of security decisions would hold back large-scale adoption and usage [ ] of ml systems in mental health use cases. for more detailed and nuanced discussions, see our treatment at section . . the goal of this framework is to establish discussions in communities of mental health, ml, and security, so we can build a common ground for directions and expectations for usable security in ml systems used in mental health scenarios. moreover, this framework serves to raise awareness, so that both ml and mental health communities will heed this critical aspect of usable security in ml systems for mental health. we hope that this new, interdisciplinary framework would allow researchers and practitioners to systematically compare usable security attributes across ml systems for mental health, meanwhile to identify potential limitations of particular approaches and trade-offs in different scenarios. in this short paper, we propose that ml systems in mental health use cases, beyond the privacy and security requirements already mandated by legislation's and regulations -for example, health insurance portability and accountability act (hippa) [ , , ] in united states, and general data protection regulation (gdpr) in european union and its member states' national laws [ , ] -should consider properties of usable security proposed by this framework's four pillars, and be evaluated on their ( )context models, ( )functionality criteria, ( )trustworthiness requirements, and ( )recovery principles across their life cycles. this work presents our effort to generate discussions and consensus for a common framework in a naturally interdisciplinary area. we built our research on the foundation of computer security research, which has a rich history and long tradition of devising criteria and evaluation rubrics for system designs and implementations. we also incorporated important and recent literature from human-computer interaction (hci), usable security, and fairness, accountability, and transparency (fat) research of ml. weaving these interdisciplinary threads together, we hope that our framework will benefit both researchers and practitioners working on ml systems in mental health. there is a long and distinguished tradition in computer security research: presciently define evaluation criteria and structure assessment frameworks, while research communities were still in their early stages of formation. from this tradition, many remarkable security research outcomes have flourished, and guided the design and building of systems and infrastructure we rely on today [ , , , , , , ] . however, while the pioneers of security research laid down "psychological acceptability" of users as a key principle for secure system design and implementations [ ] , this principle has not been actively researched within the security community until much later while security measures keep confusing even experts [ , , , , ] . moreover, the "psychological acceptability" principle is often doubted as incompatible with the goal of "security" [ , , , , , , , ] , and much of usable security research has traditionally been done in the hci community, and usable security is still a small community [ , , ] compared to other areas of security research. while "psychological acceptability" principle is first identified as the meaning of "usable" in "usable security" [ , ] , there are other efforts trying to precisely define "usability" especially in hci contexts, based on the "human-centered" attribute of interactive systems. a prominent example is iso - [ ] : "usability" is "the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use". built on this definition, [ ] of nist proposed measurements on usability evaluations. however, as [ , ] both point out, measurement of usable security can be highly diverse and context-dependent, meanwhile, such measurements and evaluations focus on the system and its interactions with targeted users, often done with small groups in controlled environments [ , ] , with security as the users' top concern. while the "security is top priority" assumption can be very reasonable for use cases such as national and corporate security, the same assumption likely does not stand when we are evaluating ml systems in mental health use cases, in which users have diverse top priorities. this complicates the already fragmented landscape [ ] of usable security, and while ml applications in mental health and fat ml research are booming [ , ] , they still do not take usability and security into serious consideration. our framework evaluates usable security of ml systems in mental health based on four pillars. each pillar, in turn, serves as the top concern for each major phase of the computer system life cycle, which can be summarized as: ( )design and implementation; ( )deployment; ( )mass adoption and usage; and finally, ( )maintenance and/or disposal [ , , ] . ( ) context: this pillar considers the intended operational environment of the ml system, and how it is designed and built to interact with different types of users with varying purposes, goals, and maliciousness. this pillar is most important during the design and implementation phase of ml systems for mental health. ( ) functionality: this pillar tackles the well-known securityfunctionality trade-off [ , , , , ] . keeping ml systems functional while making security usable, it is imperative to ask questions about the complexity and resourceintensity of security methods within the already complex and often resource-intensive ml system, the flexibility of chosen methods to accommodate future security requirements, and how they influence user interactions with the ml system. this pillar is most crucial in the deployment phase of ml systems, especially in the initial stage, when such system is in limited use, without users' significant investment of trust and time. ( ) trustworthiness: this pillar is by nature user-centered. many non-expert, lay users are already distrustful and leery of ml, and this set of requirements show that on the matter of security and usability, ml systems may still induce users' trust in the sensitive context of mental health. this pillar is the most critical in achieving active usage and large-scale adoption [ ] of secure ml systems for mental health. ( ) recovery: this pillar handles perhaps one of the toughest challenges in both security and usability: what happens, should a security incident (e.g. a breach, a compromise, or a previously undiscovered vulnerability) happens? what are we going to do with the system and users, now and later? how do we account for the incident this time, to minimize the chance that it would happen again? this pillar is the top priority in maintenance and/or disposal phase of the computer system life cycle. the list can help ask the right questions for designing and building usable security [ ] into ml systems for mental health: it determines what and how much "usability" to be considered in a security environment, and move from the more general security threat models, to specific cases of user interactions in mental health scenarios, and also to weigh in negative use cases. the properties below are agnostic to programming languages, software stacks, deployment platforms, and hardware specifications, so they are also flexible enough to accommodate a large class of usable security scenarios for ml systems in mental health. c asset audit. ml systems in mental health almost inevitably acquire information assets while in use, for example, it may include users' locations, device types etc., as well as patients' functional status information, providers' notes, and organization's intervention plans. understandably, existing regulations mostly focus on these acquired assets. however, in ml systems, "asset" is not only acquired, but also native to the system itself: its algorithms and models, ground truths, datasets, decision-making logic, and result evaluations, etc. therefore, identifying both native and acquired assets of the ml system is critical for usable security. c target user profiling. ml systems can be utilized by different stakeholders in mental health: from patients, providers, to government officials, they use the system to achieve different goals. profiling the system's targeted users is the basis to make concrete observations and reasonable estimations, which are then incorporated into design and implementation requirements. knowing the targeted users and what they use the ml system for, this is usable security's positive case: legitimate users can establish trusted paths and use the system without being hindered by its security requirements. usable security's negative case is given in c . c behaviors categorization. behaviors of targeted legitimated users described in c can be either expected by the system, or unexpected and cause the system to fail, error out, or even trigger security incidents. while it is not possible to iterate through all unexpected behaviors from legitimate users, unexpected user behaviors raise two key components of usable security, and need to be addressed in design and implementation: ( ) motivating users to behave in a secure manner so to minimize the systems' failures, errors, and security exposures, because users are not the enemy [ ] ; ( ) when such motivations fail, follow the "fail-safe" principle [ ] , meanwhile deliver warning messages about security and failures with usability in mind [ , , ] . this property is interdependent with f , where we discuss robustness. c threat modeling. once the assets are audited and target users and behaviors profiled, threat modeling is essential for security, as threat modeling is a well-studied and used subject in computer security [ , , , ] . there are three main components to consider: ( ) assets the ml system needs to protect, ( ) scope of interactions between system and user based on c ; and ( ) malicious actors and their actions the systems need to defend against. in contrast to c , malicious actors are usable security's negative case: malicious users are stopped or slowed down by the system's security measures. the following properties are most useful when seen from a deployment perspective. they describe how a ml system works with inplace security requirements while interacting with users. f complexity. most, if not all ml systems and applications have at least one of the three constraints: time, memory, and computational power. therefore, any security measures should consider these constraints and its impact on how well the ml system serves the end users. for example, in a high concurrency event where many users are utilizing the same ml system, if a given security method uses negligible computational power resource on users' end but consumers a lot of system resources, we should consider alternatives for this security method. to measure such complexity, we can use either formal algorithmic complexity notions (e.g. big o, little o), or empirical evaluations. for example, in -user, -user, , -user concurrency scenarios, what is the average computational overhead or latency for specific sets of security requirements, with other software and hardware constraints stay the same. f availability. for large-scale ml systems, e.g. mental health use cases with multiple targeted user groups, security measures also need to scale. availability evaluates how well security methods can generalize to cover a ml system's targeted users and behaviors (c , c ) without hindering their access to the ml system. a quantitative heuristic for availability of security measures is estimated user adoption rates across user groups, as well as among the genera user base. notice that the availability criteria is a trade-off to the "least common mechanism" principle for secure system design [ ] , and the relative importance between the two are dependent on results of context modeling, in particular c . regardless of which one of the two weighs more heavily in specific scenarios, the security mechanism in question must be carefully designed, judiciously implemented, and rigorously tested before real user runs. f flexibility. retrofitting security to usability is usually a bad idea and doesn't work well [ , ] , therefore it is important to not only prioritize usable security when designing and building ml systems for mental health, but also to not let current implementations become roadblocks to additional security requirements or system capabilities. having flexibility accommodates future changes in the system and shifting user base, and is a long-term commitment to the system's usable security traits. f experience validation. to ascertain that security measures did not hold back users, it is crucial to validate real user interactions and experience with the system, regardless of the methods: ideal controlled environments, synthetic experiments, or random sampling. for positive case of usable security (c ) that makes the system more secure but not harder for legitimate users, conducting user studies to evaluate their experiences, interactions, effectiveness, and satisfactions with the system [ , ] would be indispensable evidence for the ml system's real-world usability. f robustness. robustness is well-researched in computer system [ , , , , ] , and recent interests in adversarial ml [ , , ] has early roots in ml robustness [ ] . in our consideration, robustness is also related to recovery principles in section . , and has two angles: ( ) for security, to tolerate and withstand certain errors and faults from the ml layer, the system layer, and user interaction layer; and ( ) for usability, to communicates to users clearly and timely, when trusted paths cannot be established because of scenarios exceeding ( )'s robustness levels. interdependent with this criterion is c for unexpected user behavior categorization. many non-experts are suspicious and distrustful of ml, because of ml's "blackbox magic" reputation. moreover, the technical nature of fat ml methods has not endeared lay users towards machine learning either. now, suppose that another layer of hard-to-use and hard-to-navigate security measures and designs is added to an ml system, such distrust is perhaps only going to grow more intense and open. while the users' sentiment of distrust is understandable, the need for good mental health is agnostic about one's feelings towards machine learning and usability of security designs. therefore, to enable active usage and large-scale adoption [ ] of secure ml systems in mental health cases, it is important to first induce users' trust in the ml systems used, before their active utilization of such ml systems. the trustworthiness requirement suggests how ml systems in mental health may still earn users' trust, through its security and usability, by well-designed user interactions and communications. t clarity. articulating relevant security mechanisms, and their intents, impacts, and implications to users, is fundamental to trustbuilding. we identified three clarity aspects: ( )clarity of ml, where certain artifacts of the ml system's decision-making logic and process (e.g. summary statistics, explanations for classification labels) are exposed and explained to user in non-technical manners; ( )clarity of security, where user-facing security mechanisms (e.g. trusted path establishment, or revocation of access delegation), and these mechanisms' intents and purposes, are disclosed before users engage in these security mechanisms and take actions, preferably in non-technical terms; and ( )clarity of failure modes, where recovery (section . ) plan in case of security incidents, is summarized and communicated to users in non-technical terminology. t constraints. complementary to t , whose focus is on positive cases -i.e. what can be and is done -this requirement focuses mostly on negative cases. while providing clarity, ml systems need to draw boundaries and limitations on their capabilities and responsibilities, and then communicate such information. when determining the scope of usable security and communicating to users, we suggest three main factors: ( ) limitations, emphasizing what the system cannot do (e.g. delegating access without explicit user actions from a trusted path), is not authorized to do (e.g. sharing chatbot history with unknown third parties), or unwilling to do (e.g. exposing ml models' features and parameters) for technical and non-technical reasons; ( ) boundaries, concerning what the user's actions cannot accomplish; and ( ) expectations, dealing with interactions between users and systems, on what users' expectations for the systems should not be. this requirement may seem counterintuitive, but it is founded on the "fail-safe" principle of computer security [ , ] : the default situation is lack of access -that is, by default, actions and operations are constrained and not allowed to execute. t consistency and stability. for similar user behaviors under similar contextual conditions, ideally, usability-and security-related experience and interactions should be: ( ) similar, within fixed ml systems (data, algorithm, procedure, parameters, input), and ( ) comparable, across different ml systems capable to cover the same contextual conditions in their use cases. we name it "consistency" property. conversely, for the same usability and security methods, when provided with the same user behavior inputs, should respond with similar user experience and interaction. we call this "stability" property. there properties can help users build their own mental models for how security mechanisms and the general ml system work, and align their expectations with the system's responses. note that we controlled the variables ("similar", "fixed", "same") while describing consistency and stability, therefore consistency is not constancy, and stability is not staleness. in fact, the dynamic nature of usable security and the user expectation-system behavior alignment model are both well-known [ ] . the goal of alignment, is to motivate secure user behavior and raise user's trust level in the system, and consistency and stability are inroads to alignment. t reciprocity. leveraging the human tendency to return favors, ml systems in mental health can elicit actions of trust from users, and motivate their secure behaviors and active engagements, as hci research showed [ , , ] : after users receive helpful information from a computer system, they are more likely to provide useful information or actions back to the system. for reciprocity schemes in ml systems in mental health, we identify two stages: ( )initial exchange of reciprocity, where after volunteering helpful information to users, the system prompts user for desirable information or behavior input; and ( )continuous engagement, meaning that after the initial round, if the user reciprocates, the system should aim to maintain exchanges with users, when user behaviors and other contextual conditions warrant so. depending on specific areas where the ml system needs to induce trust and motivate behaviors (e.g. having users enable security features, or actively use ml capacities), details of the interaction mechanisms, from the initial offer of help to ongoing engagement patterns, will vary. because reciprocity largely depends on user interactions with the system, it naturally focuses on usability, and has different tradeoff with security for different context models. therefore, any reciprocity schemes must be designed, implemented, and validated judiciously to defend against reciprocity attacks [ ] . good security needs failure modes, and usable security is no exception. with a variety of assets to protect c , many functionalities to perform, and user trust to gain and maintain, ml systems in mental health must have a concrete plan for security failures. these principles lay out a foundation to consider the immediate and long-term aftermath of security incidents and their responses, so ml systems in mental health can retain usable security attributes and rebuild trust with users (t ). r response. previous research [ , ] surveyed security incidents such as user data leaks, but did not address more complex security challenges to ml systems in mental health, whose sensitive and diverse assets, both native and acquired, make juicy targets. therefore, ml systems must have protocols and procedures in place, timely reviewed and revised, and ready to respond to security incidents, to achieve three goals: ( ) evaluate scope and impact of incident, ( ) minimize damages to impacted assets, ( ) investigate and attribute sources of incident, and most importantly, ( ) rebuild trust in users for the system. ( ) through ( ) address immediate actions, while ( ) is a long-term process that ensures ml systems can maintain its stay with user bases in mental health. this principle is related to c , c , and trustworthiness requirements. r provenance and chronology. the usability of security, in its failure mode, entails that security failures can be traced, examined, analyzed, and inform future security decisions, and such need is satisfied by post-incident provenance and chronology. in ml systems for mental health, provenance and chronology should not only supply ( ) a time ordering of system events, technical vulnerabilities or disadvantages, procedural limitations, uncovered edge cases, user interactions, statistics, and likely warning signals leading up to the incident, but also ( ) records of any changes (e.g. content, metadata, mode, appearance) in impacted assets (e.g. manipulated ml model parameter, altered user interface, leaked health history), from when the incident happened, to when it is uncovered. both provenance and chronology can be considered for user-facing purposes as a tool for repair (r ) and to rebuild trustworthiness. r repair. post-security-incident repair has two aspects: ( )repairing the system itself, and ( )repairing users' trust in the system. ( ) is the direct logical next step of r and r with immediate impact and results, while ( ) tends to be long-term, and is more difficult -it needs all the building blocks of trustworthiness to repair users' trust in ml systems impacted by security incidents, especially when incidents concern user data, user-system interaction, or even users' offline behaviors. repairing trust needs to address additional psychological barriers of users, hence harder than building trust at first, but it is still possible when t and t are emphasized and utilized in the repair process. our framework is a suggestion, an encouragement, a proposal, and an invitation to the community to start acknowledging and researching usability and security in ml systems for mental health. while our framework is not a standardized rubric, we realize that it may become a foundation for future standards, guidelines, or recommendations by organizations such as nist, iso, or ieee, for usable security in generic interactive ml systems, or specifically in mental health applications. previously, standards were issued on transparency and autonomy in autonomous systems [ ] , and we are sanguine about a general consensus on usable security in ml systems, especially for mental health use cases. we intentionally crafted this framework to be agnostic to ml techniques: hence, we can focus on providing a unified structure that is not only comprehensive enough to cover the current interdisciplinary area between traditional computer security, hci, and ml, but is also flexible enough to accommodate future changes and progresses in these areas. we hope this framework can enable researchers and practitioners to: ( ) identify gaps in security and usability between their theoretical capacities, design variances, actual implementations, and real-world usage patterns; and ( ) quickly appraise properties of particular security and usability methods to decide on the most appropriate mechanism for their desired use cases. in addition, our evaluation framework can be used as a reporting rubric targeting regulators, government officials, and policy makers, so they can quickly get all information in one place, in a clear, structured, and comparable manner. when we speak of "practitioners" in the section above, in the specific context of ml systems for mental health, there are broadly two categories that we target: ( ) security practitioners: in general system security contexts, security mechanisms and policies are researched, designed, implemented, tested, maintained, and improved by security professionals. ( ) ml practitioners: in general ml system contexts, ml practitioners research, apply, curate, train, validate, test, maintain, improve ml models, algorithms, and date . yet, as we discuss usable security in ml systems for mental health, the matter gets more complex: there are more stakeholders, both on the system builders' side, and on the system users' side. and on each side, there are multiple considerations, interests, and mental models that come into play. table below shows the different stakeholders when we build security to be usable into ml systems for mental health. comparing it with figure , the critical differences between the building usable security into general system versus into ml systems for mental health can be clearly discerned. to summarize: there are more stakeholders on the users' side who deserve usable security for their more diverse needs of the ml system for mental health, and there are more stakeholders on the builders' side who have distinct desires for what they want do with, and how they wish such system to behave. for example, while security and ml practitioners desire different ideal attributes from the system and those attributes are not necessarily at odds or contradict with each other, there are tradeoffs to make. between "strong defense" with implications for privacy on patient information and "collect data" for training when in general, more data is usually better, the builders within themselves need to reach a delicate balance first. on the other end, instead of the cohesive and more-or-less predictable and uniform sets of actions normally expected from user models built for general software or ml systems, we now have a diverse set of potential users, with various sets of actions and behaviors that are not usually taken into account for in those general purpose software or ml systems. in those general purpose systems, figure shows the path of how security mechanism and experience are delivered to end users. behaviors, actions, expectations, and use scenarios of these end users would be captured in user models, and security practitioners would design, build, and deploy security measures and experiences according to those user models. but because of the diverse and varying expectations and actions from distinct groups of end users , such user models would be too narrow and missing out on legitimate use actions and behaviors. this is a major reason that we crafted this framework: to properly account for and appreciate the diversity and variety of users and their actions in ml systems for mental health, with the end goal to bring a usable and secure experience to all. as described within sections for each of those sub-attributes, those attributes are not mutually exclusive nor completely independent from each other. instead, there are rich and dynamic interactions between these sub-attributes, both within a single pillar and across different pillars. four major types of interactions are list below with short examples. ( ) inter-dependence: f and c are interdependent. in this case, without behavior categorization, robustness is next to impossible to plan for or implement; and without robustness measures tested and used in real-life, it would be very hard to validate if the behavior categorizations are reasonable or sufficient. ( ) trade-offs: f is a trade-off to the "least common mechanism" principle for secure system design as articulated in [ ] : for security measures to generalize to diverse sets of targeted users and behaviors, commonality increases and distinctions decline. ( ) prioritization: in many scenarios, prioritizing particular principles before others is the most reasonable and sensible course of action. for example, a large-scale online platform delivering automated conversational therapy may prioritize f and f , at the same time de-prioritize trustworthiness requirements based on the assumption that people seeking online automated services generally have greater trust in ml systems and technology, and are likely to be technically proficient enough to navigate security designs built in place. ( ) complements: t and t are complementary: they consider opposite sides of the same issue, and from there, create a comprehensive view and enables balanced and holistic decisions for usable security designs and implementations. these dynamic and interactive relationships carry deep implications for usable security in ml systems for mental health, and we will explore some examples that showcase these interactive and dynamic relationships between the properties in section . we will now apply the four pillared framework, and share several tangible use cases of ml systems for mental health where we evaluate and examine their usable security needs and profiles. this way, we can concretely demonstrate the practicality of our framework, illustrate the dynamic and interactive relationships between the pillars and their corresponding sub-attributes, and showcase the complex and distinct stakeholder demands for usable security that warrant such a framework. we will elaborate on example with brief comparisons and contrasts to the three other examples, and leave examples to for readers' exercise. ( ) chatbot providing conversation-based therapy to young adults with mental disorders ( ) auto-diagnosis algorithms of neuro-images for psychiatrists ( ) personalized matching for providers & patients ( ) ml system analyzing facial and verbal expressions during tele-therapy sessions in , the chatbot's context is an online automated services, and hence is more likely to experience high concurrency requests from many different users with existing mental disorders, and such online large-scale services may also be needed in times of distress (e.g. quarantine during covid- global pandemic) to parts of the general population. therefore, its usable security mechanisms need to prioritize being available and robust enough to handle more users, and wider ranges behaviors and actions of legitimate users. at the same time, inducing users' trust may not be as important, because we may consider people willing to use online chatbot services are more trusting towards ml and technology in general. although, soon we would see that if security mechanisms and designs are not usable or robust enough, such assumption of trust may not be warranted, and if there were any, would be drastically diminished. the assets it needs to protect are not only the security of its general software infrastructure, but also its ml algorithms that generate live conversation responses to users: recall the infamous incident of microsoft's chatbot tay on twitter [ ] , caution must be taken to ensure that legitimate users who need the therapy service could easily and readily access it without much hassle, and that malicious users could be fended off so they could not manipulate the algorithms. while this may seem simple at first, we must properly categorize behaviors of our potential legitimate users, who have existing mental disorders. take attention-deficit/hyperactivity disorder (adhd) for instance. suppose that the chatbot implements classes of captcha or recaptcha methods -which may include text, image, and sound recognition, as well as text and image matching -to defend against its threat model actors that include bots and malicious users trying to poison its algorithms. while these methods may be effective to defend against these threats, legitimate users with adhd, whose attention spans are usually shorter than the general population [ ] , may be unlikely to complete the captchas, especially when there are several ones that come one after another. when security defense designs turn legitimate users away, these users may leave with the idea that such mechanisms built to trick them, and the system behind it has no genuine intention to provide them help, and hence their implied trust in the chatbot when they first approached this online automated service, may likely diminish. hence, it would be advisable to also take trustworthiness requirement seriously, especially the constraint property. one way to demonstrate it, is using clear language or visual images to inform users of failures. for instance, when a user's attention span is too short to successfully finish a series of recaptcha challenges, a message displays: "sorry we could not tell if you are a human or bot. do you want to try another way?" this or similar messages could communicate to legitimate users that the system genuinely intends to provide services, the recaptchas are there because it is a security design, not a farce or trick to turn them away, and there are certain things that these recaptchas are not capable of doing. further thoughts bring more usable security considerations to the discussion on example . should the chatbot store any records of its user interactions, so that human providers and caregivers (e.g. parents or legal guardians) of these young adults could monitor their progress, provide better diagnosis, treatment, and care, then we are onto more complex scenarios. there are now two more groups of users to consider, and how to provide usable security for them is a crucial challenge. moreover, because now there are stored user records, there is an additional asset to protect, and the recovery principles need to be elevated to higher priorities, especially repairing users' trust in the service in the event of a breach or leak. this is where flexibility also comes into the picture: if builders of the chatbot had not originally considered this sharing services, and only later decided to add it, flexibility of previous security capabilities to accommodate the additional security requirements that come with sharing user records, is extremely important. in a base case, even when the chatbot is originally built to not only converse with patients, but also store their records and allows them to share their records with their providers and caregivers, clarity of usable security designs that are informed by behavior categorization would be integral. for instance, some young adult patients may decide to simply share their passwords with their providers and caretakers for the latter groups to look at their chat records. however, if these patients are in the u.s., this simple act may land their providers and caretakers in legal trouble: because a shared password, even a voluntarily shared one, counts as unauthorized access by the computer fraud and abuse act [ ] . preempting such behaviors would greatly inform usable security decisions when designing and building this chatbot. for example, builders may decide to use methods other than passwords to check for user authorization and authentication status; to utilize the constraint criteria, and outwardly warn users to not share their passwords even with trusted providers and caregivers; or to add particular terms in the end-user licensing agreement, security & privacy policies, or terms & services documents, and specify cases where the patient could share passwords; or to build a security measure so that patients can delegate access to their records to authenticated providers and caregivers. to choose the most suitable usable security mechanisms or combinations of such mechanisms, builders of the chatbot would need to deliberate on which contexts, and especially which threat models they decide to focus on. in comparison, example has a very specific and focused profile of targeted users (psychiatrists), so the inter-dependent properties of behavior categorization and robustness would be straightforward to analyze and design, and the assets and threat models are also relatively clearly defined and direct. meanwhile, because such system is used for medical diagnosis, all attributes related to trustworthiness need to be prioritized: the builders cannot assume that psychiatrists are trusting the ml system's decisions. moreover, while the threat models are relatively simple compared to example and , the system still needs to induce trust from the psychiatrists: how do they know it is them, instead of malicious attackers described in the threat models, who see the images, patients' information, and the algorithm outputs? to address these issues, some security designs may include: ml explanation options that accompany each diagnosis; a side bar that shows access activities; or a device-based two-factor authentication check. again, it is up to both the ml and security practitioners who build this system, to decide on the specific contexts and threats they would like to prioritize. example also has a straightforward profile for behaviors as example , but the threat models could be tricky: because depending on what the builders of the system decide to gather from both the patients and providers for the match, assets that the system needs to protect could swing a rather wide range. meanwhile, because of the usual one-on-one nature of patient-provider relationships, in contrast to example , de-prioritizing the availability of security measures to large numbers of online users could be sensible, and the system might even be able to afford using more time-, memory-, or computationally-complex security mechanisms that are nonetheless usable for both providers and patients of the service. for example, incorporating reciprocity into the human-system interaction process, by engaging users in short q&a games about secure behaviors -which by the way, could also induces trust from users about the system's security, and fulfill part of the trustworthiness requirement. but on the ml front, trustworthiness here is similar to the premise of but the spirit of example . while patients and providers who choose to use a ml-powered matching service could be assumed to have a greater degree of general trust in ml and technology, the same level of trust could not be assumed in the specific matching decisions: "how and why did this black-box know that i would be a good fit for this patient/provider?" would be the question to answer for every patient or provider who uses the service. example involves more diverse user groups (patients, providers, likely other caregivers, and potentially policy-makers). hence, the assets need to be protected are more varied and diverse, the threat models more complex, the recovery scenarios more important, and the usable security mechanisms may need to make trade-offs between availability and flexibility while still being functional when processing live audio and video data, which is another subtle constraint on the complexity of usable security mechanisms. similar to examples and , the inevitable question of trustworthiness would arise on the ml system's decision rationales and explanations, and there is also an incentive on the builders' end to assure that algorithms and models powering the system are not being manipulated. because the information being processed and analyzed by the system is largely private and sensitive, convincing different users of the effectiveness and strength of the security mechanisms is also an important task. comparing it to example where there are clearer priorities, this system poses a set of full-on challenge for usable security design, implementations, and evaluations. in our work, we presented four categories of desired properties -based on context, functionality, trustworthiness, and recovery -to systematically frame and evaluate usable security in ml system for mental health. we discussed those properties' intents, rationales, and sources in the intersection of security, usability, ml, and mental health. we propose that ml systems in mental health be evaluated by the way of this framework for security and usability, in different phases of the computer system life cycle. we have analyzed, structured, and presented several examples of ml systems in mental health in this framework, and for next steps, we plan to evaluate more real-life ml systems in mental health, preferably similar to the four described examples, so we can test, validate, and improve our framework and criteria. simultaneously, we also plan to interview builders of these ml systems in mental health, to understand their awareness of, thought processes behind, and decision rationales of usable security in the systems they designed and built. because the framework covers the computer system life cycle, while we prefer already deployed, large-scale systems, we are also happy to examine systems in early stages of the cycle. we plan to publish results on websites where this interdisciplinary community can also submit their own framework evaluation results. in a deeper dive, our future work will explore a tiered approach to usable security for ml systems, inspired by classic security literature [ ] , meanwhile further examine interactions -e.g. trade-offs, enhancements, overlaps from different perspectives, complements, and interdependence -between desirable usability and security properties for ml systems in mental health. ieee standard glossary of software engineering terminology inclusive persuasion for security software adoption users are not the enemy improving information security awareness and behaviour through dialogue, participation and collective reflection. an intervention study usable security: revealing end-users comprehensions on security warnings on the assessment of robustness in search of usable security: five lessons from the field aleksander madry, and alexey kurakin. . on evaluating adversarial robustness towards evaluating the robustness of neural networks even experts deserve usable security: design guidelines for security management systems overview of the national laws on electronic health records in the eu member states european commission. . data protection in the eu vulnerability, sharing, and privacy: analyzing art therapy for older adults with dementia cybersecurity and united states infrastructure security agency (cisa), department of homeland security is usable security an oxymoron? interactions enhancing performance prediction robustness by combining analytical modeling and machine learning a modelbased approach for robustness testing how users reciprocate to computers: an experiment that demonstrates behavior change persuasive technology: using computers to change what we think and do european union agency for cybersecurity information technology âĂŤ trusted platform module library âĂŤ part : architecture ergonomics of human-system interaction âĂŤ part : human-centred design for interactive systems practical unix and internet security effects on employees' information security abilities by e-learning consolidating principles and patterns for human-centred usable security research and development security and usability: analysis and evaluation data breaches: user comprehension, expectations, and concerns with handling exposed data towards robust experimental design for user studies in security and privacy formal models for computer security computer security online privacy and aging of digital artifacts threat modeling as a basis for security requirements attention-deficit/hyperactivity disorder approximating saml using similarity based imprecision network security & robustness trusted computer system evaluation criteria hipaa security rule crosswalk to nist cybersecurity framework the department of justice systems development life cycle guidance document usable cybersecurity framework for improving critical infrastructure cybersecurity hci and security systems a brief introduction to usable security an empirical study of the cobb-douglas production function properties of software development effort from usability to secure computing and back again toward a secure system engineering methodolgy the protection of information in computer systems transforming the âĂŸweakest linkâĂŹâĂŤa human/computer interaction approach to usable and effective security if a is the answer, what was the question? an edgy naif's retrospective on promulgating the trusted computer systems evaluation criteria ieee standard reviewâĂŤethically aligned design: a vision for prioritizing human wellbeing with artificial intelligence and autonomous systems machine learning in mental health: a scoping review of methods and applications why johnny still canâĂŹt encrypt: evaluating the usability of email encryption software threat modeling: designing for security usable security: oxymoron or challenge explainability fact sheets: a framework for systematic assessment of explainable approaches on the challenges in usable security lab studies: lessons learned from replicating a study on ssl warnings building robust systems an essay evaluating usercomputer interaction: a framework building robust learning systems by combining induction and optimization usable security is usable security an oxymoron? united states department of health and human services usenix. . usenix soups conference page microsoft created a twitter bot to learn from users. it quickly became a racist jerk security controls for computer systems: report of defense science board task force on computer security aligning security and usability reciprocity attacks user-centered security key: cord- -v gfh m authors: maghdid, halgurd s.; ghafoor, kayhan zrar title: a smartphone enabled approach to manage covid- lockdown and economic crisis date: - - journal: nan doi: nan sha: doc_id: cord_uid: v gfh m the emergence of novel covid- causing an overload in health system and high mortality rate. the key priority is to contain the epidemic and prevent the infection rate. in this context, many countries are now in some degree of lockdown to ensure extreme social distancing of entire population and hence slowing down the epidemic spread. further, authorities use case quarantine strategy and manual second/third contact-tracing to contain the covid- disease. however, manual contact tracing is time consuming and labor-intensive task which tremendously overload public health systems. in this paper, we developed a smartphone-based approach to automatically and widely trace the contacts for confirmed covid- cases. particularly, contact-tracing approach creates a list of individuals in the vicinity and notifying contacts or officials of confirmed covid- cases. this approach is not only providing awareness to individuals they are in the proximity to the infected area, but also tracks the incidental contacts that the covid- carrier might not recall. thereafter, we developed a dashboard to provide a plan for government officials on how lockdown/mass quarantine can be safely lifted, and hence tackling the economic crisis. the dashboard used to predict the level of lockdown area based on collected positions and distance measurements of the registered users in the vicinity. the prediction model uses k-means algorithm as an unsupervised machine learning technique for lockdown management. in an unprecedented move, china locks down the megacity named wuhan, in which the novel coronavirus was first reported, in the hopes stopping the spread of deadly coronavirus. during the lockdown, all railway, port and road transportation were suspended in wuhan city. with the increasing number of infections and fast person-to-person spreading, hospitals are overwhelmed with patients. later, the disease has been identified in many other countries around the globe [ ] , [ ] . subsequently, the world health organization (who) announced that the virus can cause a respiratory disease with clinical presentation of cough, fever and lung inflammation. as more countries are experienced dozens of cases or community transmission, who characterized covid- disease as a pandemic. halgurd s. maghdid is with the department of software engineering, faculty of engineering, koya university, kurdistan region-f.r.iraq. first.last@koyauniversity.org. kayhan zrar ghafoor is with the department of software engineering, salahaddin university-erbil, iraq; school of mathematics and computer science, university of wolverhampton, wulfruna street, wolverhampton, wv ly, uk. kayhan@ieee.org. *kayhan zrar ghafoor is the corresponding author. kayhan@ieee.org. the researchers can access the implementation and programming code in https://github.com/halgurd /lockdown covid in such unprecedented situation, doctors and health care workers are putting their life at risk to contain the disease. further, in order to isolate infected people and combatting the outbreak, many hospitals are converted to covid- quarantine ward. moreover, a surge of covid- patients has introduced long queues at hospitals for isolation and treatment. with such high number of infections, emergency responders have been working non-stop sending patients to the hospital and overcrowded hospitals refused to in more patients. for instance, recently in italy medical resources are in short supply, hospitals have had to give priority to people with a significant fever and shortness of breath over others with less severe symptoms [ ] . as the covid- continues to spread, countries around the glob are implementing strict measures intensify the lockdown, from mass quarantine to city shutdown, to slow down the fast transmission of coronavirus [ ] . during the lockdown, people are only allowed to go out for essential work such as purchasing food or medicine. ceremonies and gatherings of more than two people are not permitted. these strict rules of quarantine that only allows few to move around the city including delivery drivers providing vital lifeline. on the other hand, few countries, such as japan, has declared a state of emergency in many cities in an attempt to tackle the spread of the virus. although covid- started as a health crisis, it possibly acts as a gravest threat to the world economy since global financial crisis [ ] . covid- epidemic affect all sectors of the economy from manufacturing and supply chains to universities. it is also affect businesses and daily lives especially in countries where the covid- has hit the hardest. the shortage of supply chain has knock-on effects on economic sector and the demand side (such as trade and tourism). this makes a supply constraint of the producer and causing a restraint in consumer's demand, this may lead to demand shock due to psychological contagion. in order to prevent such widespread fallout, central banks and government have been rolling out emergency measures to reassure businesses and stabilize financial markets to support economy in the phase of covid- . currently, most countries are in the same boat with leading responsibility of group twenty and international organizations [ ] . to meet the responsibility, many companies and academic institutions around the world made efforts to produce covid- vaccine. but, health experts stating that it may take time to produce an effective vaccine. as an effective vaccine for covid- isn't probably to be in market until the beginning of next year, management of lockdown is an imperative need. thus, public health officials combat the virus by manual tracking of recent contacts history of positive covid- cases. this manual contact tracing is very useful at the early spreading stage of the virus. however, when the number of confirmed cases was increased tremendously in some countries, manual contact tracing of each individual is labor-intensive and requires huge resources [ ] . for example, an outbreak of the covid- at a funeral ceremony in an avenue in erbil, kurdistan region left regional government with hundred of potential contacts. this situation or many other scenarios of massive number of cases burden the government on trying to manual tracking all contacts [ ] . it is risky that health authorities cannot easily trace recent covid- carrier cases so that its probability of occurrence and its impact can hardly be measured. technology can potentially be useful for digital contacttracing of positive coronavirus cases. smartphone can use wireless technology data to track people when they near each other. in particular, when someone is confirmed with positive covid- , the status of the smartphone will be updated and, then the app will notify all phones in the vicinity. for example, if someone tests positive of covid- and stood near a person in the mall earlier that week. the covid- carrier would not be able to memorize the person's name for manual contact tracing. in this scenario, the smartphone contact-tracing app is very promising to notify that person [ ] . this automated virus tracking approach could really transform the ability governments and health authorities to contain the and control the epidemic. in this situation, a dashboard is required to assist governments and health authorities to predict when lockdown and self-quarantine will end. this research first reviews the state-of-the-art solutions to combat covid- . then, we developed a smartphonebased approach to automatically and widely trace the contacts for confirmed covid- cases. particularly, contact-tracing approach creates a list of individuals in the vicinity and notifying contacts or officials of confirmed covid- cases. this approach is not only providing awareness to individuals they are in the proximity to the infected area, but also tracks the incidental contacts that the covid- carrier might not recall. thereafter, we developed a dashboard to provide a plan for government officials on how lockdown/mass quarantine can be safely lifted, and hence tackling the economic crisis. from a technical standpoint, we summarise the most important contributions of this paper as follows: ) we build a tracking model based on positional information of registered users to conduct contact-tracing of confirmed covid- cases. ) we propose a smart lockdown management to predict a duration of lockdown. ) in order to notify contacts for confirmed cases, we also developed a notification model to cluster lockdown regions. the rest of this paper is organized as follows. section ii provide the literature review on recent advances of developed ai systems for covid- detection. this is followed by presenting an overview of the proposed approach and details of the designed algorithm in section iii. section iv presents the experiments which are conducted in the paper. finally, section iv concludes the paper. in [ ] the authors modeled on how covid- spreads over populations in countries in terms of the transmission speed and containing its spreading. in the model, r is representing the reproduction number, which is defined the ability of the virus in infecting other people as a chain of contagious infection. infected individuals rapidly infect a group of people over very short period of time, which then yields an outbreak. on the contrary, the infection would be in control if the probability gets closer of one person to infect less than one other person. this is exactly happening in fig. ; when people (black color) who have come into contact with an infected person (red color), the infection would be spread rapidly. one important aspect is how the number of infected people looks like depends on several factors, such as the number of vulnerable people in the communities, the time takes to recover a person without symptoms, the social contacts and possibility of infecting them with coronavirus. further, another factor will affect fast spreading of coronavirus is the frequency of visiting crowded places such as malls and minimarkets. thus, governments and public health authorities are responsible to manage and plan a convenient way to contain the epidemic. moreover, countries at the early stage of virus spreading need to control the epidemic by typically isolating and testing suspected cases tracing their contact and quarantine those people in case they are infected. testing and contact tracing at wide scale, the better the chance of containment. in the case of covid- , research studies have been conducted for containment or controlling the fast spreading, and hence helping governments and societies in ending this epidemic. in [ ] , the authors have investigated the importance of confirmed covid- case isolation that could play a key role in controlling the disease. they have utilized a mathematical model to measure the effectiveness of this strategy in controlling the transmission speed of covid- . to achieve this goal, a stochastic transmission model is developed to overcome the fast person-to-person transmission of covid- . according to their research study, controlling virus transmission is within weeks or by a threshold of accumulative cases. however, controlling the spread of the virus using this mathematical approach is highly correlated to other factors like pathogen and the reaction of people. one key role to track infected people and predict ending lockdown is contact-tracing. when a patient is diagnosed with infectious disease like covid- , contact-tracing is an important step to slowing down the transmission [ ] . this technique seeks to identify people who have had close contact with infected individuals and who therefore may be infect themselves. this targeted strategy reduces the need for stay at home periods. however, manual contact tracing is subject to a person's ability to recall everyone they have come in contact over a two week's period. in [ ] , the authors exploited the cellphone's bluetooth to constantly advertise the presence of people. these anonymous advertisements, named chirps in bluetooth, are not containing positional or personally identifiable information. every phone stores all the chirps that it has sent and overheard from nearby phones. their system uses these lists to enable contact-tracing for people diagnosed with covid- . this system is not only traces infected individuals, but it also estimates distance between individuals and amount of time they spent in close proximity to each other. when a person is diagnosed with covid- , doctors would coordinate with the patient to upload all the chirps sent out by their phone to the public database. meanwhile, people who have not been diagnosed can their phones do a daily scan of public database, to see if their phones have overheard any of the chirps used by people later diagnosed by covid- . this indicates that they were in close prolonged contact with that anonymous individual. fig. shows the procedure of exchanging anonymous id among users for contact-tracing. as stated in the aforementioned section, manual contacttracing is labor-intensive task. in this section, we detail out each part of the proposed smartphone-based digital contacttracing shown in fig. . the main idea of the proposed framework in fig. to enable digital contact-tracing to end lockdown and the same time preventing the virus from spreading. the best thing to do seems to be let people go out for their business, but any body tests positive of covid- , we would be able, through proposed framework, to trace everybody in contact with the confirmed case and managing the lockdown and mass quarantine. this will confirm preventing the spread of the virus to the rest of the people. the first step of the proposed contact-tracing model is registration of users. there is no doubt registration and coverage of high percentage of population are very significant for effective pandemic control. users provide information such as name, phone number, post code, status of the covid- disease (positive, negative or recovered). effectiveness of the application and digital contact tracing depends on two factors speed and coverage. for the proposed framework, we utilize global navigation satellite system (gnss) receiver for outdoor environment whereas bluetooth low energy is used in indoors. speed depends on how to reduce the time required for contact tracing from few days to hours or minutes. the more people register in the system, the better performance of the system in terms of both speed and coverage of contact tracing. in the second step, global positioning system (gps) receiver is used by the proposed model to track either individuals or a group of people visiting to a common place. the gps service class updates user coordinates to the database in every few seconds. once a registered user reports gets infected with covid- , his test result would be send to the public database in central computer server. other registered users will regularly check those central server provider for possible positive covid- cases they were in contact in the past weeks. server is responsible to compare the infected id with its list of stored ids. a push notification will be send, by the server, to those who were in contact with a person tests positive. it is important to note that the information would be revealed to the central server is an id of the phone. firebased cloud messaging is used to send push notification to multiple devices even the apps are paused or running in the background. many apps send push notification, which indicate an alert to the users. this is happen when a person is approaching someone who is infected with covid- or nearby a lockdown area. in order to protect the privacy of those who have the coronavirus, we only include an alerting message into the push notification. this certainly would be very useful for entire population to make informed decision about not getting close to covid- area. however, this notification would help the public health professionals rather than replace it. the proposal is also including a lockdown prediction model. the model is working based on the collect geographic information and crowding level of the registered users in the system. in this study, k-means as an unsupervised machine learning algorithm is used to cluster the users' positions information and predict that the area should be locked down or not based on same empirical thresholds. both scenarios results are shown in fig. . this section presents the details of how the proposed approach will be implemented. the proposal includes two main parts. first, deploying an application on android-based smartphone which will be used by the users and track/send mobility information of the users to the system. while the second side is a web-portal (including a comprehensive dashboard) to monitor and predict the visited area that should be locked down or not. -an android application is implemented on the smartphone. the application lets the users to register their infor-updated based on how the positions are nearest to each them. the pseudo code of the k-means clustering algorithm is shown in algorithm . is automatically captured through the application without user -assign the position to that cluster interaction. the covid- status includes three options which for j ← k to k n do might be covid- , none covid- , and recovered. fig. a new centroid = mean of all positions assigned to that shows a snapshot of the application form for the registration process. -once the users have completed the registration process, they can enter into the position tracking model. the tracking model is to send user's position information into the database of the system as well as shows the google map regarding to their positions, as shown in fig. b . -beside this, the users are also can receive the notification or alert about the areas which have been visited by infected users. the notification is working in the background, i.e. the user may be paused the application and uses other application on the smartphone. however, when the user opens the application and enters the infected area will receive the alert dialog. fig. c and fig. d show an example of the notification and alert dialogue. the notification and dialogue alert models are also configure both outdoors and indoors. for example, for outdoors, the gnss position information of the users is used to measure the distance between any two users' positions and then if the distance is less than meters then the notification or the alert dialog would be raised. however, for indoors, the application scans for bluetooth devices in the vicinity, and then the result of the scan is matching with pre-registered mac addressed in the system. if the matched mac addresses have covid- or recovered cases then the notification model and the alert dialog will notify the users about having covid- or recovered users in the scan area. a web portal for the system's administrators is designed and implemented using html , php, javascript, and google map api. this part of the system is to monitoring and tracing the registered users only in terms of how the areas (which have been visited by users) should be lockdown or not? to this end, an unsupervised machine learning (uml) algorithm has been implemented in the system. there are several uml algorithms including neural networks, anomaly detection, clustering and etc. however, for this system, k-means clustering algorithm is used to predict the lockdown approach for the visited area. the k-means algorithm, first, reads the tracked users' position information and their status covid- . then, in the next step will calculate the centroid position of the areas based on the dasv seeding method. the dasv method is a good algorithm to select the best centroid position of a set of nearest positions in the vicinity. then, the centroid positions will be once, the process of the clustering of the tracked users' positions information has completed, a set of clusters will be produced. then for each cluster, the distances between the positions of the different users are calculated. this is to calculate how many times the users, in the vicinity, are approaching to each other (from now called aeo). for this study, five users (user a, user b, user c, user d, and user e) are participated into the system in two different areas in usa. therefore, two different scenarios via the five users are conducted for the k-means algorithm, as shown in figure . in the first scenario the users are walking and they are located in denver area in colorado-usa, while in the second scenario they are located in aspen area in colorado-usa. a threshold for the approaching distance has been initialized to meters, i.e. if user a has been approached around meters to user b, or c, or d, or e, it means the users are too near to other users. for the two scenarios, if aeo is greater than , the system assumes this area is too crowed and the system will predict that the area should be locked down. however, if the value of aeo is less than times, it means the area should not be locked down. for trial experiments, the model predicts that the denver area in the first scenario should be locked down, since the five users during the walking in the area are approaching to each other for times and they passed the threshold (i.e. meters). however, in the second scenario, the same trials have been tested parallel with the second scenario, and the model predicted that the aspen area doesn't need to be locked down, since the users are walked far to each other. both scenarios results are shown in figure . at the emergence of covid- , many countries worldwide are commonly practiced social distancing, mass quarantine and even strict lockdown measures. smart lockdown management is a pressing need to ease lockdown measures in places where people are practicing social distance. in this paper, we developed a smartphone-based approach to inform people when they are in proximity to an infected area with covid- . we also developed a dashboard to advise health authorities on how specific area safely get people back to their normal life. the proposed prediction model is used positional information and distance measurements of the registered users in the proximity. the government and public health authorities would be able to take benefit from the proposed dashboard to get latest statistics on covid- cases and lockdown recommendation in different areas. the weak point of this study is the privacy issue of tracking position information of the users. this issue would be solved by applying encryption algorithms, in near future. however, such proposed system is significant to mitigate economic crisis and easing lockdown issues. deep learning-based model for detecting novel coronavirus pneumonia on high-resolution computed tomography: a prospective study novel coronavirus in the united states lockdowns can't end until covid- vaccine found, study says can we compare the covid- and crises? ( , appril) what is contact tracing number of covid- cases reaches in kurdistan region iraq's total now apple and google partner on covid- contact tracing technology what we scientists have discovered about how each age group spreads covid- feasibility of controlling covid- outbreaks by isolation of cases and contacts safe paths: a privacy-first approach to contact tracing conflict of interest: the authors declare that they have no conflict of interest. moreover, this research was not funded by any funding agency. key: cord- -dx bbeqm authors: simmhan, yogesh; rambha, tarun; khochare, aakash; ramesh, shriram; baranawal, animesh; george, john varghese; bhope, rahul atul; namtirtha, amrita; sundararajan, amritha; bhargav, sharath suresh; thakkar, nihar; kiran, raj title: gocoronago: privacy respecting contact tracing for covid- management date: - - journal: j indian inst sci doi: . /s - - - sha: doc_id: cord_uid: dx bbeqm the covid- pandemic is imposing enormous global challenges in managing the spread of the virus. a key pillar to mitigation is contact tracing, which complements testing and isolation. digital apps for contact tracing using bluetooth technology available in smartphones have gained prevalence globally. in this article, we discuss various capabilities of such digital contact tracing, and its implication on community safety and individual privacy, among others. we further describe the gocoronago institutional contact tracing app that we have developed, and the conscious and sometimes contrarian design choices we have made. we offer a detailed overview of the app, backend platform and analytics, and our early experiences with deploying the app to over users within the indian institute of science campus in bangalore. we also highlight research opportunities and open challenges for digital contact tracing and analytics over temporal networks constructed from them. contagious viral diseases such as the sars-cov ( ), h n ( ), mers-cov ( ), and sars-cov- ( ) have resulted in global epidemic outbreaks and placed a massive burden on public health systems around the world. these pandemics have cascading effects that result in irreparable consequences to economies and quality of life. the recent sars-cov- or covid- pandemic has triggered national and regional lockdowns across the world to curb the spread of the virus. with incubation periods that last days and with a significant fraction of asymptomatic carriers, the proliferation of the disease has been hard to detect and localize. further, testing of populations at a large-scale has proved challenging due to limited testing kits, well-trained health-care professionals, and funds in emerging economies . to tackle this problem, governments and health workers use contacttracing of infected social distancing: social distancing is the practice of maintaining physical distance between individuals to prevent the spread of face-to-face communicable diseases. a . - m distance is recommended for covid- . tracing is the process of identifying people might be at risk due to physical interactions with a disease carrier. individuals to identify those who may have come in contact with them, also called primary contacts. these primary contacts are then quarantined and/ or tested depending on their symptoms. testing, tracing, and isolation form essential components of covid- management, besides preventive measures like wearing masks, practising social distancing , and washing hands . traditional methods of contact tracing are often laborious and may be erroneous due to recall biases , . also, human activity patterns often involve interactions with strangers, especially when travelling, which makes it difficult to identify contacts using traditional methods. as a large fraction of the population owns smartphones, countries around the world, including india, have attempted to use digital contact tracing , , . mobile apps that use bluetooth technology are deployed to record close interactions between users. these bluetooth low-energy (ble) apps typically advertise a unique device id, j. indian inst. sci. | vol xxx:x | xxx-xxx | journal.iisc.ernet.in which can be recognized by other nearby devices with the app that scan for and save these advertised ids, also called contacts. this information is typically stored on the local device; if a user tests positive, their bluetooth contacts are uploaded to a central database and their contacts are alerted. this can dramatically reduce the time required for contact tracing from days to potentially hours, thereby mitigating the spread of the virus . examples of such national-scale apps include aarogya setu in india, tracetogether in singapore, covidsafe in australia, covid alert in canada, corona-warn-app in germany, etc. however, there are limitations to digital contact tracing. these constraints include the low reliability and asymmetry of bluetooth technology in detecting nearby users , , , ; low accuracy of the proximity distance between users to help distinguish nearby and farther off users , ; high degree of adoption required for digital contact tracing to be effective , ; and the inability to locate secondary and tertiary contacts until the primary and secondary contacts test positive, respectively. it is hence still important to use complementary digital contact tracing with manual methods. in this article, we describe gocoronago (gcg), a digital contact tracing app for institutions, which attempts to address these limitations. a key distinction of our approach is to collect the contact trace data of devices into a centralized database, continuously, irrespective of if or when a person is diagnosed as covid positive. this proximity data of all app users are used to build a temporal contact graph, where vertices are devices, and edges indicate proximity between devices for a certain time period and with a certain bluetooth signal strength. this approach has several benefits. when a gcg user is tested positive for covid- , we use graph algorithms to rapidly identify primary, secondary, and other higher-order contacts, based on who guidelines . further, even if the bluetooth scans were missed by the infected user, successful scans by other proximate devices can be used to alert the relevant contacts, increasing the reliability of detection. in addition, centralized digital contact tracing has the potential to estimate the state of the population using network-based seir models, which can be used to assign risk scores and prioritize testing , , . of course, centralized contact data collection has its downsides, primarily, the privacy implications of tracking the interactions between a large number of individuals. we take several precautions to mitigate this. one, the app is designed for deployment only within institutions and closed campuses, and not at a city, regional, or national scale. the data collected are owned by the host institution and not by a central authority. two, users do not have to share any personal information, and devices are identified using a randomly generated id. sharing gps location or their phone number is voluntary and through opt-in. last, deanonymization of data is limited to covid- contact tracing and, by design, requires multiple entities to cooperate, and is overseen by an advisory board with a broad representation from the institution. we discuss these pros and cons in more detail later. besides a centralized data collection approach, we also conduct experiments to understand the impact of various smartphone devices and the environment on the bluetooth signal strength to better ascertain the proximity between devices. we also send proactive messages for users to enable custom bluetooth settings in their smartphones to improve reliability. the use of the gcg app within an institutional setting, with data collection and usage governed by the organization, may lead to higher adoption of the app and enhance its effectiveness in contact tracing. this article examines the design rationale, architecture, and our experience in deploying the gocoronago digital contact tracing app as part of a pilot at the indian institute of science (iisc). it also discusses the challenges and opportunities in improving the utility of digital contact tracing. the rest of the article is organized as follows: in sect. , we review digital contact tracing and provide an overview of a few popular covid- apps. section provides details of the app design and the backend architecture. in sect. , we describe various analytics, including temporal contact network algorithms, for contact tracing, and for providing feedback to app users. finally, sect. summarizes our experience with deploying the app at iisc and highlights some of the opportunities and challenges of digital contact tracing. j. indian inst. sci. | vol xxx:x | xxx-xxx | journal.iisc.ernet.in background and related work . contact tracing infectious diseases, that spread through personto-person interactions, can be contained by tracking their sources and quarantining the individuals who are or may be affected. this is typically done using physical interviews, which try to determine the places visited and the people met by the patient . in some cases, the location history of the patients is shared by cities and public health agencies on websites and mobile apps to allow others who were in the vicinity at that time to take precautions. this form of contact tracing relies heavily on one's memory and collecting such data manually is cumbersome. contact tracing is crucial, especially for viruses such as the sars-cov- that exhibit high transmission rates, low testing rates, long incubation times, and a significant fraction of asymptomatic carriers, who could infect other susceptible individuals , , . digital contact tracing, on the other hand, involves the use of technology to keep track of the individuals who came in close proximity with each other. it has been shown to be effective in preventing the spread of communicable diseases in livestock , , but experiments involving human populations have been limited . the scale at which covid- has spread has led to the use of bluetooth and gps-based contact tracing applications on mobile phones. such apps help individuals automatically keep a record of the places they visited and the people they met, along with the timestamps. this permits us to build contact neighborhoods that can be used to alert or quarantine the concerned individuals and identify potentially risky interactions. most digital contact tracing (dct) apps for covid- rely on bluetooth technology available on smartphones. in addition, a few apps collect the gps location of users. the rapid spread of the covid- virus has led to the development of a variety of smartphone apps around the world, which are variants on this theme. examples include both national apps like aarogya setu (india), nhsx (uk), and covid safe (australia), as well as those proposed by institutions, like novid (cmu) and safepaths (mit). a review of contact tracing apps can be found in , , , , and their features are contrasted in table . at a broad level, these apps scan and advertise for bluetooth signals and record the timestamp, along with the signal strength or the received signal strength indicator (rssi), reported in decibel-milliwatts (dbm) in android. the rssi values are negative and higher when the devices are close to each other. translating the bluetooth rssi to proximity distances for contact tracing is not straightforward since it depends on numerous factors such as the phone hardware, drivers, operating system, ability to run continuously in the background, and interference due to surfaces. yet, they have been widely attempted and deployed because of its potential advantages over manual contact tracing. in fact, to address some of the interoperability issues across android phones and iphones, google and apple have even introduced an exposure notifications (gaen) protocol into their os as part of their covid- response . the bluetrace protocol used by apps in singapore and australia is another popular standard. europe has two competing contact tracing standards that are being refined, decentralized privacy-preserving proximity tracing ( dp t) and pan-european privacy-preserving proximity tracing (pepp-pt) . the bluetooth special interest group (sig) is also working on a contact tracing standard for wearables . such protocols help with mobility across national boundaries, avoid having to install multiple apps, and in the development of custom, yet interoperable, apps. besides smartphone-based apps, others have also developed hardware devices such as the tracetogether token that uses bluetooth, but operates independently of a phone, or wearables like wristwatches that can track the location using gps . in addition to bluetooth, a few apps like novid also broadcast ultrasound signals using a phone's speakers and other apps in the vicinity detect them using their microphone . there have also been other digital apps such as the nz covid tracer that use qr codes for users to check-in when they enter specific locations . besides contact tracing, digital tools have also been used to track symptoms among populations to identify emerging "hotspots" and for health professionals and volunteers to coordinate their response . however, the global adoption of contact tracing apps is low. the percentage of the population who have installed such apps has struggled to go j. indian inst. sci. | vol xxx:x | xxx-xxx | journal.iisc.ernet.in table table comparing gcg features with other covid contact tracing apps, as on sep past %, even among developed countries where a majority of the individuals have smartphones . while there is debate on the minimal adoption rate required for contact tracing apps to have a tangible effect, some use is better than none and more is better , , . in particular, higher adoption rates in dense neighborhoods can highlight the effectiveness of tracing effective since the risk of spreading the infection is greater in closelyknit communities. there are a number of ways in which one can design such digital contact tracing apps. these offer different trade-offs in terms of individual privacy and the health and safety of the community. the target of the app may be for national/regional use or institutional use. while national-scale contact tracing apps potentially offer greater ability to manage the pandemic, they also carry greater risks of data leaks and misuse . further, a high degree of adoption at such large scales is challenging, limiting the usefulness of the app for contact tracing. apps deployed at an institutional scale can be better targeted to the audience and offer better uptake due to the fact that the data are managed at the organizational level. institutions can also respond more rapidly based on insights provided by the app. but they are less effective when users are moving outside the confines of campuses and interacting with the broader population, e.g., apps like aarogya setu and tracetogether are national apps, while goc-oronago, novid, and covid watch are designed for institutions. the use of the app may be voluntary or mandatory. some countries like china have made such apps mandatory for all residents, or for those meeting certain requirements such as travelers. even organizations may make such national or institutional apps mandatory within their premises. but most countries and institutions tend to keep the use of such apps voluntary. further, the use of the collected data for contact tracing may also be voluntary or mandatory. if voluntary, there is an explicit opt-in by the individual who is tested covid positive or is quarantined, before contact tracing using their data can be initiated. alternatively, there may be rules in place that allow the government or institutions to use any proximity data that are available with them, without additional consent from infected users. an explicit consent helps address concerns of social stigma around covid patients. the use of gcg is strictly voluntary, and there is an additional consent required by a user who is infected with covid- before their data can be used for contact tracing-this, despite their data already being available centrally in the backend. apps may collect identifiable, strictly anonymous, or pseudo-anonymous information as part of contact tracing. some apps like singapore's tracetogether compulsorily require the contact details and/or a national identification number to be shared when installing the app. this makes it quicker to reach-out to users during contact tracing, but also heightens the risk of misusing the data for the surveillance of specific individuals and can lead to a significant loss of privacy if the data arre breached. in a strictly anonymous setting, no personal information of the user is collected, and they are only identified by a random id, which itself may also be changed (or "rotated") periodically. a set of such ids may be provided by a central server (tracetogether) or generated locally by the app. during contact tracing, the user's app is alerted and they have the option of voluntarily responding by contacting the health center or a government agency. if the user uninstalls the app, it may be impossible to do contact tracing. a hybrid approach of pseudoanonymization ensures that the contact trace data themselves are anonymous, but the information required for de-anonymization is available with a trusted independent authority whose consent is required (optionally, with a consent from the infected individual) to identify the users relevant for contact tracing. gcg adopts this hybrid model that balances the privacy of users while also enabling rapid and reliable outreach during contact tracing. the contact tracing data may be kept de-centralized, semi-centralized, or centralized. if decentralized, the bluetooth device ids observed by a user's app are stored locally on the device. when a user tests positive for covid- , they can inform a backend service of their device id (potentially, multiple ids, in case of id rotation) and their status. the backend periodically relays a list of device ids associated with covid positive individuals to all apps, which is then used by the user to verify if they came in contact with a covid positive person. this is used by pact and google-apple exposure notification (gaen) framework . in a semi-centralized approach, a mapping between an app and its device id is maintained centrally, but the contact trace data remains locally on the device. on testing positive, a user may choose to (or be required to) upload the contact trace data for the recent past to a backend service, which then sends notifications to these primary contact devices asking them to quarantine or get tested. examples of this approach include bluetrace and aarogya setu . however, aarogya setu also allows users to voluntarily upload their bluetooth contact data to central servers at any time to get an estimate of other high-risk users in the vicinity. last, in a centralized approach, both the mapping of apps to device ids as well as their contacts are sent to a backend service periodically. when a user reports themselves as covid positive, contact tracing can be initiated on the centralized data already available, optionally after an additional consent. gcg adopts this model. this variant is relatively intrusive, but arguably has advantages that may justify its use. one, contact data from both the infected and the proximate users can be combined to increase the reliability of contact tracing. two, even if users uninstall the app, if the data collected are personalized or is de-anonymizable, then contact tracing can still happen over the backend data for the period during which the app was kept installed. three, not just primary but even secondary and tertiary contact tracing, can be performed rapidly. and four, having a centralized model allows us to perform temporal analytics on a global contact network. this can help identify high-risk individuals for prioritizing preventive, testing and (future) vaccination strategies, and infer the health of the user population. bluetooth data provide the relative interaction between proximate users but in itself does not reveal the spatial location of users. while this may disclose interaction patterns between (anonymous) users, which is necessary for contact tracing, correlating this with particular individuals is not possible without additional out-of-band knowledge about them. some contact tracing apps may also collect gps data (covid safepaths) and data from beacons or qr codes (novid) that may reveal the absolute spatial location of the users. collecting spatial location has some benefits. the coronavirus may be transmitted through surfaces or be suspended in the air and thereby be passed on to others who are not near an infected user but in the same location soon after . bluetooth based proximity will miss such users. also, gps data collection may be more reliable than bluetooth. however, gps is not precise enough to be useful for identifying proximity between users. furthermore, tracking the spatial movements of users continuously can have serious privacy consequences , . bluetooth beacons and scanning qr codes present at well-known locations can also provide such spatial information, but will be limited to places where the beacons or codes are deployed. gcg allows users to optionally share their gps data through an explicit opt-in and also allows the selective use of beacons deployed by institutions. last, we need to consider the duration for which the centralized or de-centralized data that are collected retained. this needs to be explicitly stated by the apps for transparency. more the data that are collected and more personalized it is, the greater are the consequences for retaining it longer, especially in a centralized or semicentralized setting. typically, the contact trace data themselves are useful only for roughly days after they are collected since this duration is typically the outer time-window of transmission of the virus. also, there should be clarity on how long the data are retained after a user uninstalls the app. gcg deletes a user's phone number, the only personal data they may share, from its backend within months of them uninstalling the app. the anonymized contact trace data are retained for future research purposes, as per the rules set out by the institute human ethics committee (ihec). the gocoronago (gcg) contact tracing platform consists of a smartphone app and backend services for data collection, management, and analysis. the app is designed for covid- operations and management within an institution and is also proposed as a research project governed by the institute human ethics committee (ihec). the design and technical details of the app and qr code: quick response (qr) code is a -d barcode standard which serves as a machine or device readable label that encodes information. smartphones can use their cameras to take a picture of the qr code and apps or libraries can extract the information present in them. examples of such information include some identifier, the physical location or a url to a website. beacon is a compact device that can be configured to continuously broadcast an identifier and some custom data as part of a bluetooth signal. other bluetooth-enabled devices can detect these signals to get information, typically specific to the location of the beacon. the backend services are described in this section. a high-level design is illustrated in fig. . the gcg app is limited for use by authorized institutions. since not all institutions may have a private/enterprise app store for their organizations, hosting the app in the public google play or apple app store is convenient. users at authorized institutions are provided with individual invitation codes by a separate entity within the institution, typically the information technology (it) office. the it office also maintains a mapping from the user's unique invite code to the actual individual to whom the code was provided, along with their contact details, as shown in fig. . this mapping from the individual to their invitation code is later used by the it office during contact tracing, as described in sect. . . the user can download the gcg app from the google play store or from an institutional download link. during installation, users enter this invite code into the app, which submits and validates it with the gcg backend servers and is returned a unique id, a device id, and a pin. the gcg backend maintains the mapping from the invite code to the unique id for the installed device. the invitation code can only be used once by the user for the first installation. to allow future re-installations, a pin is generated for this invitation code and is shared with the user. optionally, the user may provide their one-time password (otp)-verified phone number during installation, which is recorded in the backend. this number can be used along with the pin to reinstall the app in the future, in place of the one-time-use invite code. last, a device id in the form of a random bit uuid is generated by the backend for each re/installation on a phone, and a mapping is maintained from the unique id to the device id, along with the creation timestamp. this device id will be broadcast as part of the bluetooth advertisement (fig. ) . both the invite code to unique id and unique id to device id mappings are used during contact tracing (sect. . ) . a final piece of information collected from the app during re/installation is the make and model of the phone. as we discuss later, this is vital for interpreting the bluetooth signal strength and translating it into a distance estimate. these identifiers are designed to maintain the anonymity of users from the gcg app and backend, enable de-anonymization of contact users upon an authorized request for contact tracing, and ensure that the app can be re/installed by authorized users. such sandboxing and identifierindirection ensures that no single entity -the it office, a gcg user, or the gcg backend-can independently find the identity of any (other) user and their trace. a key tenet of gcg is transparency. the installation process in the gcg app has disclosures on the legal terms and conditions for the use of the app, and on how the data collected will be used. in addition, there is also a multi-lingual informed consent, as required by ihec, which clearly documents the scope of the research project, potential benefits and downsides, voluntary participation, etc. the gcg app uses bluetooth low energy (ble) signals to detect other proximate phones running the app. the ble wireless protocol is ubiquitous among smartphones sold within the last years. it enables low-power, short-range wireless communication and is intended for applications in fitness, smart homes, healthcare, beacons, etc. its maximum range is < m though this is affected by environmental conditions and transmitting power, and ≈ m is the typical range . ble devices use an advertising and scanning protocol to discover each other and establish a connection. when acting as a server, the devices advertise one or more services that they support, which are identified by service assigned numbers; when acting as a client, they find servers to connect, to based on the advertised service assigned numbers. a single device may advertise multiple services, and it can include a custom payload such as a service name. also, the ble advertisement is broadcast in an open channel, which any nearby ble client can discover. besides standard bit service numbers that are registered and pre-defined for specific types of services, applications can also generate and use bit uuids for custom services they provide. once discovered, clients can establish a network connection with the service to perform additional operations such as data exchange. the gcg app acts as both a client and a server when using the scanning and advertising capabilities of ble, respectively. specifically, it advertises two service assigned numbers, x , which represents a generic access service, and another custom service whose assigned number is the unique device id for a particular app installation. this advertisement is broadcast continuously. as a client, the gcg app scans for secs every minute for advertisements that contain the service number x . if found, it extracts and records the device id that is sent as a secondary service number in the same advertisement. piggy-backing the device id as a service assigned number rather than a custom payload takes fewer bytes, which in turn can reduce the power consumption for the advertisement. as part of the scanning, the gcg app also retrieves the received signal strength indicator (rssi), which is the strength of the ble signal that is received by the app. as we discuss later, this can be used to estimate the proximity distance. the gcg android app uses the default ble settings for broadcasting its advertisements, which translates to ble broadcasts every sec at a medium transmission power level. also, the app consciously does not establish a connection with apps on another device; the device id is broadcast to any ble device that is in the vicinity. in fact, we explicitly set the connectable flag in the advertisement to false. this enhances security by avoiding malicious content from being transferred. while such proximity tracking is helpful for contact tracing of individuals who were spatiotemporally co-located, this does not address situations where two users shared the same space, such as an atm, mess dining hall, or campus grocery, but for a short time apart. since covid- can be transmitted through surfaces and can linger in the air for some time , it is beneficial to identify users who were in the same location but not at the same time, especially for locations with a lot of footfall. the gcg app allows users to voluntarily share their gps location information with the backend. this is disabled by default. if enabled by the user, the gps location is retrieved and uploaded to the backend every mins, and buffered for retries. since the sharing of gps location is strictly voluntary, gcg supports the selective use of beacons installed by institutions at such highrisk spaces. these beacons behave like a gcg app that passively advertises its device id, and the smartphone app can scan for and record the beacon's id, just as it would detect another gcg smartphone's device id. specifically, we use the ibeacon protocol from apple. the beacon transmits a static gcg uuid as its service number, x c, as the manufacturer id for the protocol, and a major and minor version number to uniquely identify itself. the gcg app scans for the static service number, filters results based on the manufacturer id, and retrieves the major and minor version numbers. the app encodes these version numbers into a template uuid to form a unique device id for that beacon and adds it to its proximity trace. during each scan, the proximity data collected consist of zero or more device id(s) and the corresponding rssi values that were discovered at that timestamp. performing a service call to send these data to the backend servers consumes power and bandwidth on the phone. instead of sending these data after each scan, we buffer it to a sqlite database on the phone and periodically send the buffered data to the backend in a single batch. this transmission interval is set to mins. this type of batching amortizes the power and network costs across scans, while ensuring the freshness of the data available at the backend. buffering is also beneficial when internet connectivity is intermittent. if the proximity data cannot be sent to the backend, the buffered data are retained on the device and a resend attempt is made in the next transmission interval. given that this is the most frequent service call to the backend, we use a compact binary serialization to represent the proximity data sent to the backend, unlike the other services which use json. the gcg app needs to run in the background all the time for effective bluetooth advertising, scanning, and proximity data collection. however, the heterogeneity of smartphone models and the limitations of their os means that this advertising and scanning may not be reliable. to identify issues with specific device models and app installations, and verify if the app is running, we collect and report liveliness telemetry statistics to the backend every hour. these include a count of ble scans performed, ble scans failed, gps scans, gcg users and beacons detected, and contact buffer size; bluetooth and gps enabled status, bluetooth and gps permission flags, battery level, app version, etc. these statistics also help us in understanding the aggregate usage of the gcg app within an institution. besides tracking bluetooth contact data, the gcg app offers several features to inform the users about covid- and engage them in preventing its spread. screenshots of these ui elements are shown in fig. . key among these is a proximity alert, wherein a notification is triggered on the smartphone if or more users (configurable) were detected within a ≈ m distance during the last bluetooth scan. this acts as a warning to users in case they inadvertently overlook social distancing. as discussed later, the m distance threshold is just an estimate based on the rssi. the alert is also triggered only once an hour (configurable) to avoid saturating the user. in addition, users can visualize a plot of the hourly count of contacts segregated by the duration of contact within the hour, e.g., < mins , − mins and > mins (fig. b) . this gives them a sense of their interaction pattern for the past hours. similarly, we also display the number of scans performed each hour for the past h (fig. c) . this can help identify issues with bluetooth scanning on specific phones, and prompts the user to take corrective measures. a summary of the number of scans completed per day is also shown as a progress bar to motivate users to hit or more of the possible min scans (fig. a) . j. indian inst. sci. | vol xxx:x | xxx-xxx | journal.iisc.ernet.in these local analytics within the app are complemented by aggregate analytics performed in the backend and are shared through the app each day. these include the social distancing score, user density heatmap for neighboring locations, and a visualization of the contact network neighborhood. these are described later in sect. . a unique aspect of the app is that the set of remote analytics available can be dynamically changed without having to update the app. in the future, this can also be used to push forms and conduct surveys from within the app. importantly, none of the analytics provided to users reveals the identity of other users or even their device ids, to protect their privacy. for example, the hourly contact bars only report the aggregate counts of nearby devices and cumulative duration of interaction at different distances, while the proximity alert is triggered only if at least three users are nearby to prevent fine-grained estimates of the number of gcg users from being revealed. last, we also provide helpful information to educate users about covid- . these include a plot of the positive, recovered, and deceased cases across time in india, and in the local state, and a map of the current positive cases at the state and district level. in addition, we also share let's control covid and curious about covid? infographics as app alerts each day, which suggest precautions, debunk myths, and offer scientific information (fig. f) . these are sourced from public health and science resources such as who, the covid gyan initiative from iisc-tifr, and indian scientists' response to covid- the features described here are largely applicable to gocoronago v . on android smartphones. gocoronago v . is a lighter version available for ios with features limited to advertising, scanning, and receiving alerts. this is due to the limited numbers of iphone users on the academic campus. there are other os and device-specific issues as well that we encountered and addressed in various iterations of the app. while we were initially using wildcard filters when performing bluetooth scans for service numbers on the android app, we noticed that certain phone models such as samsung did not reliably perform such scans. this led us to adopt the x approach. continuous bluetooth advertisement and scanning in the background is challenging in android, and virtually impossible in ios. smartphone brands with custom android builds, such as xiaomi, oppo, vivo, etc. do not always support the recommended practise of executing such applications as a foreground service with a persistent, ongoing notification. as a result, users are forced to change the android battery usage settings and/or autostart permissions for the gcg app, which are brand and even model specific. absence of reliable scanning and advertising defeats the key purpose of the app. we provide local analytics and alerts to help users address such issues. further, android requires users to enable gps to even perform continuous bluetooth scanning, as a way to indicate to users that their location may be revealed indirectly, say, through beacons at well-known locations. but requiring gps to be on even though the app does not collect the gps location without opt-in confuses users, and may lead to privacy concerns. on ios, the problems with background bluetooth advertisement and scanning is well documented due to apple's restrictive policies , , . the ios gcg app is effective when in the foreground and when the user is viewing the app. however, when the user is not actively using the app or the phone is locked, the app can scan for other devices that are advertising, but it cannot advertise. as a result, there needs to be other android or active ios gcg devices nearby for contacts to be recorded, colloquially referred to as "android herd immunity" . besides technical challenges, there are also policy challenges in deploying covid- related android and ios apps to google play and apple app stores. certification from an official government of india agency with specific verbiage was required before the gcg android app would even be reviewed for hosting on the play store, and the subsequent reviews of the app's update takes weeks. given the restrictions that apple imposes on apps posted on its app store, the ios gcg app is only viable for an ad hoc or enterprise license deployment. gcg web services, data management, and analytics are hosted on the microsoft azure public cloud. as shown in fig. , these are present on different virtual machines (vms) that are segregated based on their workload (service endpoint, data management, analytics), and their security zone (internet, intranet, and internal). we describe these backend capabilities next. a suite of rest service application programming interface (api) is defined for the gcg app virtual machines (vms): a virtual machine (vm) is a computing environment that provides all the functionalities of a full computer, but executes within another computer. a vm is the typical unit of renting a computer in public clouds. vms help divide a single large computer or server in the cloud into multiple smaller computers, and the vms are independently rented to different users. public cloud: public cloud is an internet-based service that allows users to rent and access remote computation, storage and software capabilities that are hosted at large data centers offered managed by service providers like microsoft, amazon, and google. it reduces the cost and effort in managing physical computing infrastructure at an organization, and at a higher reliability and scalability. to interface with the backend, to upload data and to download analytics and alerts. the rest services are implemented using java servlets running on apache tomcat web server, and their service endpoints are accessible on the internet. these apis include register device, add proximity contacts, add gps, add liveliness, get notifications, and fetch analytics. most use json as the rest body, except add contacts which uses a binary protocol. the register device api accepts an invitation code from the app, checks a mariadb table if the code is present, not expired and not yet used, and if so, generates a random device uuid, a random pin and a unique id for the user, which are returned back to the app. these mappings, as described earlier, are maintained in mariadb. the phone number, if provided, is salted, hashed, and stored in the database for comparison in the future if a user reinstalls the app. the number is also asymmetrically encrypted and stored in the database, so that it can be decrypted upon authorization by the institution's advisory board, if needed. the decryption key is store securely off-cloud to prevent accidental breaches. the add contact api is most frequently invoked, once every mins by potentially 's of users. to avoid the power, compute, and network overheads of de/serializing json, we use an alternative binary format. it starts with bytes of the source device id, followed by a series of scan records, one per scan. each record starts with bytes of unix epoch time in seconds with the scan record's timestamp. the next byte indicates the number of device contacts 'n' in that scan, followed by × n bytes having the byte device id and byte rssi value for the n proximate devices. if more than n = devices are found in one scan, the app creates multiple scan records. records are created and sent by the app even if there are no proximate devices, since this information is also useful. as mentioned before, beacons are also encoded as device ids following a standard uuid template. intuitively, each record forms an adjacency list for the contact graph. the binary records from service calls from all users are appended to a file and every h, a pre-processing service fetches these binary files and generates a corresponding csv file with an edge list consisting of the timestamp, source device id, sink device id, and rssi. this csv file is backed up to azure blob store and, as discussed later, stored on hdfs for further analytics. add gps is the next frequently called api, every mins, for users who choose to share their gps location. these data are used to generate a device density heatmap of the user's neighborhood for the recent past, and potentially for contact tracing. to support such spatio-temporal queries, we use the influxdb temporal database to store the gps data. one copy of the latitude and longitude is asymmetrically encrypted and stored in influxdb, along with the timestamp, to support authorized contact tracing. another copy is transformed using a geohash of characters, which reduces the precision of the location to a m × m grid. when generating the heatmap for the app user's current location, we query over this geocode. the app communicates hourly device health data using the add liveliness api, as a set of keyvalue pairs that has evolved over app versions. as a result, we store these data within azure cosmos db, which is a nosql database. these data are later queried for identifying devices that are not reporting bluetooth data reliably for sending alerts with possible fixes, and also for monitoring the overall status of the gcg deployment at an institution. alerts are sent to the app using a custom notification service in the backend that the app polls every mins. this approach was initially chosen over google or apple's push notifications to reduce the dependence on external services. alerts that are generated by various analytics are inserted into a mariadb table with the device id, title, content, type, and validity time range. when an app polls the service, any pending alerts for that device are returned. besides displaying alerts to the user, they may also have a special payload that triggers changes to the ui, such as updating the social distancing score on the main screen. user-level analytics such as displaying their contact network and other analytics such as the user density are sent to the app as html that is locally rendered. the app invokes a get analytics api, which returns a json containing a list of current endpoints that serve the analytics. the plots and maps are served off an apache instance. separately, we also run our own open street maps tileserver for serving the map tiles. these external-facing services are hosted on a separate set of vms over which the services are distributed based on their workload and to avoid performance interference. these vms are shown in orange in fig. . we use one azure d s v vms to host the register device, add gps, and add liveliness endpoints, a second one that exclusively runs the add contact, and another to run the get geohash: geohash is a mechanism to encode a location in the form of a compact sequence of alphabets and numbers that are easy to remember, compared to latitude and longitude. typically, longer hashes offer a higher precision of the location. programming interface (api) is a description of the input and output parameters that are received and returned when accessing a capability offered by an application. j. indian inst. sci. | vol xxx:x | xxx-xxx | journal.iisc.ernet.in notifications service; the latter two see a higher load. the tileserver for displaying open street maps, which is only occasionally used, runs off an azure b s vm, while the analytics are served from an azure d s v vm. a separate azure d s v vm hosts mariadb and influxdb used by these services. besides the internet-facing services, there are internal services to support the gcg platform. these are used to host an operations portal to oversee the health of the system, on-boarding of devices, and visualize the contact network. the portal does not directly access any user database or files to prevent accidental access to or modifications of the raw data. instead, a separate routing service offers a limited set of well-defined services to access authorized data. these apis are periodically called and the results are cached in a separate mariadb instance used by the portal. the portal and its database are also hosted on separate vms, shown in yellow in fig. . this sandboxing also extends to the analytics services, which too do not directly access the user databases for sending alerts or generating visualizations, but operate through this routing api. for example, the liveliness data are fetched every mins through this routing service from cosmos db and into mariadb for the portal to visualize the number of scan records received and scans failed among the apps, while the device registration summary is fetched through the api to plot the users on-boarded over time, distribution of their device make and models, etc. ensuring the security of the services and the data collected by the gcg platform is of paramount importance and is intrinsic to various design and deployment choices. all the rest endpoints use http/ with http strict transport security (hsts), which forces the use of a transport layer security (tls . /ssl) encrypted channel between the gcg app and the backend and prevents man-in-the-middle attacks. further, all service calls are authenticated based on a device key that is returned to the app during registration. to ensure that this service call authentication is light-weight, we use a digital signature protocol, which ensures that each call can be locally validated, without the need for any database (fig. ) . specifically, the device key is generated by the backend service as key = base (sha (device id, salt)), where salt is a secret phrase known only to the service. the gcg app encrypts and stores this device key on the phone. subsequently, when invoking any backend service, the app sends its device key, the current timestamp, and a signature, which consists of sign = base (sha (device id, timestamp, device key)) as part of its https header or body. the service then uses the received device id to generate the device key on the fly, and additionally uses the timestamp to generate the signature. it also verifies if the timestamp rest: representational state transfer (rest) is a software architecture that allows desktop and mobile clients to interact with internet services by passing requests and receiving responses, using web standards such as http and data models like json. passed is recent, for mitigating replay attacks. if the generated signature matches the received signature, the request is valid and is executed. note that all of these are flowing over an encrypted https channel. various other best security practises are used. the register device service takes measures to mitigate brute-force attacks using random invitation codes and pins by limiting the number of daily attempts. internal services such as the portal are only accessible from the institution's private network, over vpn, and are additionally secured using authentication. firewall rules are used to restrict access to unused ports. direct ssh access is not available to any vms running services or the database. the internet-facing vms are in a separate subnet from the ones hosting the databases and internal services on azure to keep the networks in different security domains. data flows between the services and databases/storage are tightly controlled and a routing service used for internal services. we run the latest stable release of all software and the latest security patches to protect against known security flaws. the mariadb sql database follows the principle of least privileges for access, and only minimal permissions for select or select/ insert are given to user accounts. user-defined functions are disabled. all queries are templatized to avoid sql code injection. sensitive data such as phone number and location are kept hashed and/or encrypted when stored. this prevents privacy from being compromised even if there is a cloud security breach and the data are leaked. we use asymmetric public-private keys so that only public keys are hosted on the vm for encryption and private keys for decryption are kept securely offline. contact data are backed up to azure encrypted blob storage. the backend services have undergone professional vulnerability and penetration testing by crossbow labs. the gcg app is designed to provide feedback to users on their daily interactions using simple metrics and contact neighborhoods. additionally, to improve user engagement, the app also provides heatmaps of user density and charts and maps that show the covid- situation in various states and districts around the country. in this section, we describe these features along with the contact tracing protocols that are in place if an app user tests positive. we receive contact records from various devices that contain the contact timestamp and associated bluetooth signal value. for efficient primary and secondary contact tracing, we periodically stitch these contact records to create a global contact network graph. further, we annotate the edges with the contact timestamps and signal values to creating a temporal contact network or a temporal graph. we use apache spark to perform this stitching from the csv edge file, as a pre-processing step. specifically, we create an interval graph for scans received during a specific time interval. the spark application takes a start and end time for the interval, and then filters in all the edge list entries in the input csv file whose timestamp falls within this time interval. it then groups all edges by their source and sink vertices to create an adjacency list for each vertex that includes all scan entries from either source or sink edges. every edge is characterised by a time interval [t s , t e ) , where t s is the earliest scan timestamp and t e is the latest scan timestamp between the connecting devices, during that interval. scans on an edge that fall on adjacent time points with the same rssi value are combined to form longer intervals on the edge annotations. this gives a set of disjoint sub-intervals on the edge with an associated bluetooth signal strength. the output is stored in hdfs for future analysis. temporal graph: like a regular graph, a temporal graph (or temporal network) is a collection of vertices and edges between vertices that indicate a relationship between them. but the vertices and edges that exist at different points in time may vary, and their attributes may also change over time. e.g., temporal graphs model interactions in a social network, traffic flow in a road network and proximity contacts in a contact tracing network. the social distancing score provides users with a measure of their extent of social distancing, on a daily basis. unlike the local bluetooth data used to plot the contact counts on an hourly basis within the app, the social distancing score uses more global knowledge from a device and its neighbors. in particular, it accounts for "background devices" that are often or always in the vicinity, such as family members or hostel room neighbors, and which are subtracted from this score as their sustained presence does not pose any additional risk. these scores are calculated using apache giraph once a day, over the interval graph created for the preceding -h period. the score calculation depends on three parameters: signal threshold (δ) , minimum contact duration (φ m ) , and background contact duration (φ b ) . for each device id, we first identify those neighboring devices that could detect each other for at least φ b mins , cumulatively, during the -h period. these neighbors form the background devices and are eliminated from further analysis. currently, we use φ b = mins. next, from the remaining neighbors, we retain only the rssi entries which exceed a value of δ on their edge sub-intervals. this helps identify the duration of nearby contacts with them. based on experiments described in the next section, we set δ = − , which approximates a distance of m. we sum up the duration of nearby contacts for each edge, and those whose duration is greater than φ m mins form the proximate contacts, p. we set φ m = mins by default. intuitively, this means that the user has interacted with p other devices in close physical proximity of about ≤ m for a cumulative of mins or more in the past h, but who are not part of the sustained background presence. from this, the social distancing score for a device is calculated as max{ , − p} . this normalization offers a higher score for users who practise social distancing and a lower score for the others. in the example snapshot, assume that δ = − , φ m = mins and φ b = min . for the device c, devices b and d are proximate contacts since their close contact durations are h and h, respectively. however, a is not a proximate neighbor of c since it is a part of its background, having been detected for a total of h. so the social distancing score of c is . measures the sars-cov- virus is currently assumed to spread by 'contact and droplet' as well as airborne transmission . who and various countries have provided social distancing advisories that emphasize a minimum spacing of - m for curbing the spread of the virus , , , , . being able to nudge users to maintain such distancing is one of the goals of the gcg app. however, inferring distances accurately from bluetooth rssi values is non-trivial. factors such as smartphone hardware variations, body interference, and multi-path interference lead to both false-positives and false-negatives while estimating the distance from rssi values , . researchers elsewhere have conducted experiments to understand if contact tracing apps can estimate if two users are close to each other, i.e., within a distance of m for mins or longer . these were performed with google pixel and samsung galaxy a devices using the open-trace app, an open-source version of singapore's tracetogether app . they used different environmental conditions such as signal attenuation by the human body, a handbag, walls, etc. and also by enacting real-world scenarios. the measured rssi and the distance are plotted over time to understand the variability for different configurations and their relationship to the ground truth. another smart contract tracing (sct) system uses machine learning classifiers to classify the contacts as high/low risk using the bluetooth rssi values. they perform experiments to collect rssi from a nokia . with android and htc m with android . for distances ranging from . - m, and for random device orientations, and at different locations such as hand, pocket, and backpack. the collected data are labeled as + (high-risk, ≤ m ) or − (low-risk) according to the ground truth. they filter the data using a moving average filter before training using machine learning classifiers like decision tree, linear discriminant analysis, naïve bayes, k nearest neighbors, and support vector machine. the google-apple exposure notification api in android also applies ble calibration corrections based on manual measurement of the signal strength under standard conditions. given the hardware diversity we observe among our campus population, we conduct similar lab-scale experiments, as described, using a more diverse number of smartphones and beacons. we evaluate the effect of rssi at , , and m distances to help us determine whether two phones are within m. we use a debug version of the gocoronago android and ios apps that log the bluetooth scan information to a local file on the smartphone in our experiments. the experiment was performed in an open room measuring about × m with few furniture, mimicking a real-world environment. our experiment uses android devices, iphones, and all the devices were used at a high battery level, with power-saving modes disabled and screen set to stay on for as long as possible while performing the bluetooth scans. each experiment configuration was performed for a period of mins to give ≈ rssi measurements per device pair in that configuration. given the technical limitations of ios, android devices can detect other android devices and the beacons, and iphones can detect the android devices. considering these factors, two experimental setups were designed to collect the rssi data as illustrated in fig. . for the distance a = m , we use a hexagonal placement, as shown in fig. a , with pairs of devices at the vertices, a, b, c, d, e, f, and the center, g. these give us devices at distances of m (same vertex); m, between adjacent vertices, e.g., a-b; m, between vertices at diagonal corners, e.g., a-d; and √ m for vertices that are two hops away, e.g., a-c. three runs with the hexagonal setup are required to ensure that every pair of devices is measured at a m distance. for distances a = m and m the devices were arranged in three clusters, a, b, c, at the corners on an equilateral triangle with a side of length a (fig. b) . in each cluster, the devices are placed vertically and adjacent to each other, in a row. devices across clusters are separated by a distance a while those within a cluster have a distance of ≈ m . three runs of the triangular setup with different clusters are performed to ensure that we get the rssi for each pair of devices at m and m. a key rationale for this study is to understand if two devices are within m of each other or not, as we use the m distance as the proximity threshold in our platform. a total of rssi data points at m, data points at m, and data points for m are collected. we focus our analysis on just the android phones, which form the bulk of our deployment. there are , , and data points for , , and m between the android devices, respectively. for each distance and a device pair, we drop the maximum and minimum rssi values to eliminate outliers. an empirical cumulative distribution function (cdf) of the rssi values at , , and m are shown in fig. a . the x-axis shows the rssi values, while the y-axis lists the corresponding percentiles for different distance configurations. we see that there is a substantial overlap between data points at the three different distances for a given rssi. for example, for an rssi of ≤ − , we have % of the m data points, % of the m data points, and % of the m data points fall within that signal strength. so, using any single threshold value of rssi as an estimate for a m distance is liable to result in both false positives and false negatives. for this preliminary study, we wish to determine an rssi value that is the most discriminating with regard to the ≤ m and > m proximity. so for each rssi value, we plot the difference in the percentile of data points that are at m and at m distances, and this is shown in fig. b . the peak difference is observed at an rssi value of − , i.e., the difference between the true positive of m ( %) and false positive of m ( %) is the highest. hence, we use an rssi of − as the proximity threshold in our gcg app and the backend analytics. in the future, we propose to study the effect on rssi from different pairs of phone models and in different environmental conditions in order to develop a more customized proximity threshold, instead of using a single global value that is currently adopted. when an app user tests positive for covid or is under mandatory quarantine, the current protocol at iisc requires the campus health center to check if the user is willing to share their contact data for tracing. if so, they are asked to enter their phone number within the gcg app, if not done so. the health center collects and enters the gcg unique id, device id suffix, and phone number from the user into a portal. this initiates a call to the gcg backend and triggers an otp to the user's phone number, if the details match with an existing user. the user may share this otp with the health center and this serves as their informed consent for contact tracing. the health center enters the otp and any additional details about the subject, such as symptoms, start and end dates for contact tracing, and test information. the gcg backend confirms if the otp is accurate, and if so, the request is forwarded to the advisory board to get the primary and secondary contacts for this user. the advisory board has representatives from the institute, including faculty, staff, students, doctors, and a bio-ethicist. if the board approves the request through their portal, the gcg backend is notified and it will perform a time-respecting breadth first search (t-bfs), which is a variant of breadth first search (bfs) performed over the temporal contact graph. the t-bfs will be initiated from the device id corresponding to the given user's unique id and for the time duration in the past indicated by the health center. if the user's unique id is associated with multiple devices during this period, the search will be initiated from each of these ids. the output is a list of device ids for the primary and secondary contacts. we then use the invitation code, unique id and device id mappings maintained in the gcg backend to get the list of invitation codes used by the primary and secondary contacts. these invitation codes are shared with the it staff, who then use their mapping table to deanonymize them and provide the health center with a list of email ids and/or phone numbers of these contacts. the gcg backend also provides the duration of contacts for each of the invite codes. the health center can then choose to initiate their relevant protocols for reaching out to these contacts, and quarantine or test them. if mandated by law, the health center may share the contact trace data with the local government agency responsible for covid- surveillance. engagement besides the local analytics within the app, we also provide additional analytics to the gcg user based on aggregation in the backend. figure d shows a heatmap of gcg user count in a . × . km area around the current location of an app user, if they share their gps location. it is aggregated over the past h from users who share their gps data. these data are queried from the timestamp and geohashes present in the influxdb backend. in order to respect privacy, the location data are spatially coarsened into tiles of approximately m × m , and temporally coarsened over h, and only the aggregate count of users in each tile is shown. also, when few users are present in a tile, we display these data in a categorical manner, e.g., < . the contact graphs that are constructed in the backend can be visualized using tools such as gephi. figure shows a subset of the temporal graph generated for a single day. here, the size of a node depends on its degree centrality measure across the entire time duration. the thickness of the links depends on the duration of their contact. while such a graph is instructive for backend analytics, we use it to generate a neighbourhood tree for each user, as shown in fig. e . the tree is based on the last h of data and contains contacts up to two hops. importantly, this is a tree and not a neighborhood sub-graph to preserve privacy, i.e., edges between the -hop and -hop neighbors are not shown to avoid revealing contact patterns between them. these trees are generated on a daily basis. it helps the users get a sense of not just their primary contacts, but also their secondary contacts, which could be much larger, and in-turn motivate users to take greater precautions by socially distancing. the gcg app is currently deployed at the indian institute of science (iisc), bangalore. the iisc campus is an access-controlled residential campus with close to students, over faculty, centrality measure: centrality measure is a graph-theoretic score that measures the relative importance of vertices in their ability to spread or influence other vertices in the network. examples of these measures include degree, betweenness, eigenvalue, closeness centrality, page rank, etc. they are used to identify important or critical vertices in contact networks, social networks, www graphs, road networks, etc. and over research and administrative staff. a majority of the students and faculty live on campus. however, iisc entered a full shutdown in march, , a few days ahead of a nation-wide lockdown in india, and the students on campus were instructed to leave for their homes. initial versions of the app were tested among faculty volunteers during the lockdown period. the gcg app was first rolled out to students in june, after a subset of them were allowed to re-enter campus, and subsequently to other faculty and staff. at the time of writing this paper, the gcg app has been installed by over users at iisc. a plot of the number of installations of the gcg app over time is shown in fig. . sharp jumps in installations correspond to new invitations or reminders sent to students, faculty, and staff for installing the app. the app is yet to be rolled out to essential workers such as hostel cooks, cleaning staff, and security personnel, and noticeably, some of the early cases of covid- on campus have been initiated through them. this is understandable since many of them stay off-campus and possibly have a larger mobility footprint, increasing their risk of acquiring the coronavirus. while the gcg android app was initially hosted on the iisc website due to restrictions by google and apple in hosting covid-related apps on their online app stores, it has recently received approval to be hosted on the google play store, with v . currently available there since early august, . an ad hoc ios version is also being tested since the last week of august, . while gcg is designed for institutional use, contact tracing for users from the same institutions who interact outside the campus is also captured. this benefit can be further enhanced through a federated deployment for institutions that are spatially close to each other, such as a cluster of college campuses and software tech-parks in the same neighborhood. here, the chances of physical interaction between users from different organizations are high, e.g., visiting the same local cafeteria or grocery store. in this federated deployment (fig. ) , individual institutions would maintain their independent gcg deployments. but in addition, they would share the strictly anonymized contact graph for their institution with a trusted data broker, such as a non-profit agency or a neutral university. this data broker would then stitch these graphs together based on contacts between unique device ids that span graphs from different institutions. this can then be used to trigger "glocal" analytics-a global combination of local clusters that are near each other-and share more accurate proximity scores with the users of individual institutions, as well as perform more effective contact tracing across institutions in the same community. a key requirement to preserving privacy is that no personal data should be shared with this trusted broker, and any de-anonymization for contact tracing should strictly be handled at the local institution. this can further be complemented through the use of national or regional-scale contact tracing apps, even if used by a smaller fraction of users who are mobile. this can help link clusters of gcg contacts within institutions, and allow with contact tracing beyond the institutional premises as well. however, care should be taken to sandbox the regional and institutional datasets to avoid privacy loss. the availability of fine-grained contact tracing data has opened opportunities for new research on infection spreading. classic epidemiological models are compartmentalized formulations that classify the population into different states such as s (susceptible), e (exposed), i (infected), and r (removed/recovered). based on the progression patterns of a disease, different models such as si, sis, sir, and seir models , , , have been proposed. these models are applicable to large populations and can estimate the time evolution of the fraction of individuals in different states over time and can identify the peak number of infections for different reproduction numbers. the assumptions in these models are, however, coarse and their utility is hence limited. they can be used to take higherlevel policy decisions such as deciding the duration of lockdowns, planning hospital bed-capacity over time, etc. however, the input data for these models are tightly related to the testing rates, which in the case of covid- was very low during the initial few months. research in the past two decades has extended such compartmentalized models to static or timevarying contact networks , , , . in a static network, a node, if infected, can potentially infect any other nodes that it comes in contact with, regardless of the time of contact. but in dynamic networks, temporal ordering is preserved. that is, if an individual a comes in contact with a person b before b and c interacted, then a faces no risk from c. this can correct for the over-prediction of infection rates from static models. with bluetooth-based mobile contact tracing, it is possible to include both duration of contact and the signal strength, which is a proxy for the distance between the phone users during their interaction, to make better predictions of the transmission rates. results from simulated experiments by kretzschmar et al. , indicate reduced reproduction numbers when contact tracing is performed using mobile apps as the delay in alerting vulnerable individuals is reduced to a minimum. apart from identifying primary and higher-order contacts quickly, contact data allow us to identify the most vulnerable users through either simulations of network models assuming hypothetical initial conditions or centrality measures. most centrality scores from network science are defined on static graphs, and it would be interesting to develop better centrality measures that can be used to find the nodes with higher spreading capabilities in a temporal network. identifying such individuals can in-turn be used to device adaptive testing and vaccination strategies, which can help improve the estimates of the health states of the population, especially when testing is expensive, or its availability is limited. another major opportunity with centralized contact tracing is the ability to influence social distancing behavior using alerts and scores. creating control groups and providing such information to one of them and observing their contact patterns for a limited subsequent period can throw light on the effect of such scores. such randomized control trials can help quantify the effectiveness of contact tracing apps even in the absence of covid- case data. one of the key challenges with digital contact tracing is user adoption. as highlighted in sect. , digital contact tracing requires a large fraction of users within the community to use it before it becomes effective. having only a small sample of individuals use the app makes it difficult to identify the true sources of infection, because of which paths between infected individuals and their primary and higher-order contacts may go undetected. however, our experience with institutionallevel contact tracing appears more promising than that employed by governments at a national level in terms of the fraction of users installing an app and the duration for which they had it installed on their phones. in fact, recent reports indicate that even % of user adoption of contact tracing apps can have a meaningful impact of - % reduction in covid infections and death . that said, not all workplaces are captive environments. in such cases, neighborhood or regional deployments of contact tracing apps may be required since they are more likely to interact with people outside their cluster. further, people may also interact during activities outside workplaces and their institutional contact tracing app can be ineffective during these periods. we frequently observe app users turn off their bluetooth or gps, because of which the contact trace data collected are curtailed. users may do so to save battery-even though our experience shows that the android app consumes less than % of batter in an entire day-or when they perceive a lower risk based on their current activity and environmental conditions. these factors can dramatically offset the promises offered by network-based epidemiological models in identifying risk-prone individuals and in contact tracing to contain the spread of infection. it is also extremely difficult to impute such missing data and no assumption can be confidently justified. although digital contact tracing apps have several potential advantages, validating its usefulness is tough. the difference between the two approaches can be best demonstrated when there are covid positive app users who have shared data for continuous periods. in practice, it is wise to use data from such tools in conjunction with manual contact tracing since there would be gaps in data due to user behavior or technology limitations. building robust epidemiological models is all the more challenging because they contain several parameters that have to be calibrated from sparse and missing data. heavy reliance on digital contact tracing apps can also exclude fractions of the community who use feature phones. visitors to institutions such as delivery providers can also be missed out but can contribute to virus spreading. digital contact tracing is still in its infancy. it is important that individuals understand the data shared, risks, and benefits before fully using such apps. communicating these details to a lay audience can be challenging and misconceptions about what such apps collect and can do are not uncommon. in this article, we have described the various dimensions of digital contact tracing for managing the covid- pandemic. we have highlighted the approaches taken by diverse apps globally and their pros and cons. we have proposed gocoronago as an institutional contact tracing app, whose design choices attempt to balance the privacy of individuals with the safety of the community in performing rapid multi-hop contact tracing. we have offered a detailed technical description of the gcg app, its backend services, and analytics. this platform is currently being validated at the iisc university campus, with additional campus deployments underway. we have shared our early experiences with the deployment over the past few months, in the midst of the covid- epidemic, and the opportunities and challenges that lie ahead. given the evolving nature of covid- , our continued experience with this contact tracing platform at iisc and other campuses can serve as a role model, or a cautionary tale, in managing the pandemic in the ensuing months and years. springer nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. using the data collected from the app is currently under review by ihec. the authors are also glad for valuable inputs from dr. olinda timms from st. johns research institute and prof. mukund thattai from ncbs on the design of the contact tracing protocol to balance safety and privacy. a special thanks to crossbow labs for their pro bono security testing services. received: august accepted: september advisory on social distancing measure in view of spread of covid- disease. tech rep . world health organization (who) ( ) contact tracing in the context of covid- : interim guidance world health organization (who) ( ) coronavirus disease (covid- ) advice for the public coronavirus: people-tracking wristbands tested to enforce lockdown ) digital tools for covid- contact tracing: annex: contact tracing in the context of covid- . tech rep . google and apple ( ) exposure notifications: using technology to help public health authorities fight covid- centers for disease control and prevention (cdc) ( ) social distancing modeling the combined effect of digital exposure notification and non-pharmaceutical interventions on the covid- epidemic in washington state a survey of covid- contact tracing apps infectious diseases of humans: dynamics and control incubation period of novel coronavirus ( -ncov) infections among travellers from wuhan network science an overview of mobile applications (apps) to support the coronavirus disease- response in india bluetrace: a privacy-preserving protocol for community-driven contact tracing across borders indoor distance estimated from bluetooth low energy signal strength: comparison of regression models bluetooth sig to extend reach of covid- exposure notification systems automated and partly automated contact tracing: a systematic review to inform the control of covid- why the nhs covid- contact tracing app failed covid- contact tracing apps reach % adoption in most populous countries physical distancing, face masks, and eye protection to prevent person-to-person transmission of sars-cov- and covid- : a systematic review and meta-analysis social distancing: the science behind reducing from two metres to one metre. independent trace together token: teardown and design overview inferring distance from bluetooth signal strength: a deep dive editorial board ( ) much-hyped contact-tracing app a terrible failure. the sydney morning herald quantifying sars-cov- transmission suggests epidemic control with digital contact the effect of network topology on the spread of epidemics temporal dynamics in viral shedding and transmissibility of covid- critical mass of android users crucial for nhs contact-tracing app. the guardian demographic structure and pathogen dynamics on the network of livestock movements in great britain a contribution to the mathematical theory of epidemics mathematics of epidemics on networks contactbased model for epidemic spreading on temporal networks impact of delays on effectiveness of contact tracing strategies for covid- : a modelling study effectiveness of isolation, testing, contact tracing, and physical distancing on reducing transmission of sars-cov- in different settings: a mathematical modelling study coronavirus contact tracing: evaluating the potential of using bluetooth received signal strength for proximity detection decentralized is not risk-free: understanding public perceptions of privacy-utility trade-offs in covid- contact-tracing apps covid- mortality is negatively associated with test number and government effectiveness accuracy of bluetooth-ultrasound contact tracing: experimental results from novid ios version . using -year-old phones a computer oriented geodetic data base and a new technique in file sequencing covid- and your smartphone: ble-based smart contact tracing no, coronavirus apps don't need % adoption to be effective use of social network analysis to characterize the pattern of animal movements in the initial phases of the foot and mouth disease (fmd) epidemic in the uk uber removes racy blog posts on prostitution, one-night stands the pact protocol specification. private automated contact tracing team mobile location data and covid- : q&a the epi info viral hemorrhagic fever (vhf) application: a resource for outbreak data management and contact tracing in the - west africa ebola epidemic covid- digital contact tracing: apple and google work together as mit tests validity simulation of an seir infectious disease model on the dynamic contact network of conference attendees aerosol and surface stability of sars-cov- as compared with sars-cov- virus spread in networks epidemic spreading in real networks: an eigenvalue an individual-based approach to sir epidemics in contact networks china's virus apps may outlast the outbreak. stirring privacy fears the authors acknowledge a research grant from the department of science and technology (dst), government of india, to partly sponsor this work (grant no. dst/icps/ rakshak/ ). they also recognize the support offered by the rakshak review committee. yogesh simmhan was supported by the swarna jayanti fellowship (grant no. dst/sjf/ eta- / - ). the authors thank the administration of iisc for assistance with the development and deployment of gcg, the members of the institute who volunteered to test early versions of the app, and prof. y. narahari who offered valuable guidance to the project. the authors are grateful for the detailed feedback offered by the institute human ethics committee (ihec) at iisc in designing the operations and the research study. a proposal for research key: cord- -b bla fp authors: mcfate, clifton; kalyanpur, aditya; ferrucci, dave; bradshaw, andrea; diertani, ariel; melville, david; moon, lori title: skate: a natural language interface for encoding structured knowledge date: - - journal: nan doi: nan sha: doc_id: cord_uid: b bla fp in natural language (nl) applications, there is often a mismatch between what the nl interface is capable of interpreting and what a lay user knows how to express. this work describes a novel natural language interface that reduces this mismatch by refining natural language input through successive, automatically generated semi-structured templates. in this paper we describe how our approach, called skate, uses a neural semantic parser to parse nl input and suggest semi-structured templates, which are recursively filled to produce fully structured interpretations. we also show how skate integrates with a neural rule-generation model to interactively suggest and acquire commonsense knowledge. we provide a preliminary coverage analysis of skate for the task of story understanding, and then describe a current business use-case of the tool in a specific domain: covid- policy design. interactive natural language applications typically require mapping spoken or written language to a semi-formal structure, often represented using semantic frames with fillable slots. this approach has been used in popular commercial spoken dialogue systems (e.g., google's dialogflow and amazon's alexa skills) through the developer-defined "intents." frame semantic parsing more broadly (e.g., gildea and jurafsky ) has demonstrated benefit in a number of downstream applications including dialogue systems (chen, wang, and rudnicky ) and question answering (shen and lapata ) . despite advances in frame semantic parsing (e.g., swayamdipta et al. ) , no semantic parser is perfect. accordingly, developers of natural language interfaces must carefully curate correction dialogues to avoid frustrating interactions. this sort of mismatch between system and user expectations is what we aim to resolve with skate (structured knowledge acquisition and extraction). in skate, a user's text is parsed in real time as they type. the resulting partial semantic structures can be completed with additional required slots and fillers, and are then recursively refined by the user through micro-dialogues. at any point, the user can continue to give structured interpretations for a slot filler (e.g., a complex noun phrase), or they copyright © , association for the advancement of artificial intelligence (www.aaai.org). all rights reserved. can leave it in unstructured form for the system to interpret later. in the following sections, we first walk through the skate architecture using an exemplar interaction from an open-domain rule learning task. we then summarize our implementation of the core skate components. we demonstrate how skate has been integrated with a natural language rule generation model to interactively acquire structured rules for story understanding, and conclude with a current application that uses skate to build covid- policy diagrams. the skate architecture ( figure ) is built around an interaction model of recursively: recognizing a concept, producing a partially interpreted template to instantiate the concept, and allowing a user to refine the template. the result is text annotated with semantic frames. these frames are processed by the downstream application. each interaction begins by selecting a top-level, application-specific semantic template. as an example, a top-level frame for rule acquisition may be an "if/then" construction as in the first pane of figure . these top-level templates provide the initial scope of interaction, and can be used to apply additional application-specific semantics as needed (e.g. if/then could produce a causal rule while after/then might only imply temporal sequence). as a user fills a slot, the concept recognizer processes their text, selects a lexical trigger, and instantiates possible semantic frames for that trigger. the frames are in the style of framenet (ruppenhofer et al. ) : each defines a concept, a set of possible trigger phrases, and semantic arguments that may be instantiated as text spans. at each interaction, we use syntactic heuristics to select the trigger with the widest syntactic scope. for each instantiated frame, the template renderer receives the interpretations (frame predicates and optionally argument labels/spans) from the concept recognizer and decorates that information (e.g., by adding display texts, examples etc) to send to the front-end ui. it also presents the user with options for what frame to assign as the word sense of the trigger. for example, in the second pane of figure , the template generator has built frame assignment options for the word "take." the resulting micro-dialogue is presented to the user. once an option is selected, the corresponding template is displayed, and the user can recursively refine unstructured slot fillers as in the third pane of figure . the user can also choose to leave slots as unstructured text. for instance, in figure , the user may not need to specify the desired sense of "cookie,", and the entry can be submitted without full specification (in which case, uninterpreted tokens become placeholders/variables in the underlying semantic representation, and can be refined at a later point). note that in many domains, it is necessary to solicit extra information from a user given an evoked frame. when instantiating templates, required roles that remain unfilled can be added to the template to appear as blank slots for the user to specify (and must be filled in before the user submits). additionally, likely roles suggested by context can be added and optionally deleted. once a user is satisfied with what they have typed, they can submit the entry. the set of composed frames can be further processed by the application-specific semantic converter if necessary. for example, in the rule application and covid- policy builder, the resulting frames are turned into a set of horn clause-like statements. in this section, we briefly describe the core skate components . our concept vocabulary is organized such that each predicate corresponds to a frame. all frames minimally possess the "focal" role which corresponds to the lexical trigger for the frame, though they may have additional optional and required roles. frames are stored in an inheritance hierarchy, allowing multiple inheritance. as a domain general starting point, we have created a frame ontology called hector, derived from framenet (rup-penhofer et al. ) and the new oxford american dictionary (stevenson and lindberg , noad) . these two resources are complementary: framenet has broad coverage for multi-arity relations, while noad has a large library of lexical concepts (entities, attributes, etc.). the hector ontology can easily be pruned into subsets for specific domains and/or expanded with novel concepts. defining a new frame requires, minimally, defining its roles, writing a short definition or example, and optionally positioning it in the existing frame hierarchy. skate's performance improves with annotated examples, but they are not required, and as discussed in the next subsection, skate can generate its own training data as a new frame is selected by the user and elaborated upon in skate interactions. the concept recognizer component consists of two semantic parsers. the first, spindle (kalyanpur et al. ) , is a transformer-based neural semantic parser. this model can be fine-tuned using a corpus, but requires annotated data. the second parser acts as a fallback and is used when spindle returns no results or low confidence frame interpretations. it is based on an unsupervised approach that retrieves k nearest frames based on an embedding match between the sentence typed so far and potential frame embeddings, the latter being generated from minimal frame annotated examples pre-specified by a domain author . as skate is used in an application, the corrected output of the second parser becomes training data to improve the first. thus, skate is able to improve with use. we have developed a neural semantic parser called spindle that treats frame parsing as a multi-task problem involving related classification and generation tasks. given a sentence and a frame-triggering span, the model decomposes parsing into frame-sense disambiguation (multi-label classification), argument span detection (generation), and rolelabeling (classification). since these tasks are related, spin-dle uses a joint multi-task encoder-decoder architecture (see figure ) , where the encoder layer is shared among the various tasks, with different decoders used depending on the task type. the model is trained on k annotated frame sentences (available in framenet and noad) by fine-tuning a pretrained, transformer-based language model such as gpt (radford et al. ) or t (raffel et al. ) . the spin-dle model achieved the best results using t as the base encoder/decoder, with a frame sense disambiguation accuracy of % and a span detection/role labeling f of %. even though the parser was trained on full sentences, we have found that it returns results with high accuracy when run on partial sentences like those typed in skate. moreover, as the user continues to type text, the parsing results change to consider the additional context, which helps to disambiguate the correct frame sense. embedding-based heuristic parsing to complement the neural semantic parser, which needs many annotated examples for training, we have developed an unsupervised, k-nn-based approach for frame parsing that can work with a handful of examples per frame. the approach first computes a frame embedding by aggregating glove (pennington, socher, and manning ) embeddings for trigger lemmas (which are specified in the frame definition) and content words in frame examples. our tool then sums the glove embeddings for all words in the sentence to produce a sentence embedding, and computes the similarity between the frame and sentence embeddings. the algorithm also detects argument spans using syntactic heuristics based on a dependency parse of the sentence. finally, it assigns a role for each span by considering how well the type of the span phrase matches the expected role type as inferred from frame examples (type similarity checking is also done using embeddings). the result of a submitted entry is a possibly incomplete frame-semantic parse of the input text. the semantic converter can also contain domain-specific logic to further convert the frame semantic interpretation into usable data for a downstream application (as described in the domain adaptation section). story understanding skate has been applied for open-domain structured rule acquisition. the task is: given a short story and a question, provide a rule or set of rules with which the answer can be derived from the story. using skate, we can collect structured formal rules usable by a downstream reasoning engine. as described above, skate templates are meant to guide the user both with explicit structure (e.g., slots) and, optionally, with unstructured slot-fillers. these unstructured fillers can be used to guide the user to submissions with highconfidence semantic parses or towards prototypical examples. for this task we integrate skate with a neural unstructured rule prediction system to guide the user towards general, syntactically simple, rules. glucose (generalized and contextualized story explanations; mostafazadeh et al. ) is a crowd-sourced dataset of common-sense explanatory knowledge. glu-cose defines ten dimensions of causal explanation, focusing on events, states, motivations, emotions, and naive psychology. the glucose dataset consists of both general and specific semi-structured inference rules that apply to short children's stories. these rules were acquired via crowd-sourcing, and mostafazadeh et al. ( ) demonstrated that neural models trained on these semi-structured rules could be used to produce human-like inferences for story understanding. following mostafazadeh et al. ( ), we train an encoder-decoder rule generation model. for each sentence in a story, we use the glucose trained model to predict unstructured textual causal inferences. these uninterpreted inferences are then used to seed slots in skate rule templates, guiding the user towards high-likelihood story-relevant rules (see figure ). the glucose-trained model can also be used for autocomplete suggestions in skate. as the user types text in one of the structured template slots, we run the model on the text typed so far (i.e., in earlier slots of the template) and generate potential completions. a novel feature of skate is that we use the already specified frame semantics to filter out incompatible language model suggestions. for example, say the user is providing knowledge about a soccer story, and starts typing: "if a player gets" and specifies the interpretation for the verb "get" as the frame arriving-at-alocation. at this point, the frame template has an unfilled slot for "destination". suppose the user continues by typing text in this slot, and we use the glucose model to generate completions, it may produce the following alternatives: "..a ball", "..to the goal", "..into trouble" given the prior text "if a player gets". however, because the user has specified the frame semantics for "get" and the active slot is "destination", the only compatible suggestion is "..to the goal". to identify compatible suggestions, we run the spindle semantic parser on the full generated completion (including the prior text) and filter out suggestions where the frame doesn't match the prior specified frame. in the above example, we would throw out "gets a ball" (where get means acquire), and "gets into trouble" (where get means transitionto-state), since it does not match the earlier specified interpretation of "get" (arrive-at-location). we believe that suggesting text completions that are consistent valid semantic interpretations given prior context is unique to skate. as a preliminary coverage evaluation, we asked domain experts to use the tool to encode knowledge needed to answer and explain commonsense questions generated from children's stories. the questions and required rules were generated in english as a part of several manually created story understanding rubrics (dunietz et al. ) . our data set consisted of target natural language rules from children's stories. rules could range in complexity from simple attributive statements (e.g. "often, a house has a yard.") to complex script-like statements (e.g. "if a person plays soccer and the person belongs to a team and the person moves the ball to the goal then the team gets a point."). to test declarative statements (factoids), we additionally asked the annotators to enter, exactly or as a paraphrase, sentences from an additional stories. annotators were trained on how to use the skate interface and then, for each rule or statement, they rated how close in intended meaning their resulting entry was to the original nl expression on a scale from - ( = not close; = substantial deviation; = minor deviation; = paraphrase). results were promising, with % of entries scoring or higher, including several complex constructions involving nested clauses, conjunctions and negation. some high scoring examples are shown in table . the main gaps were missing frames from the target ontology (hector). knowledge target skate input (score) people generally want to eat food that is tasty often people want to eat tasty food ( ) when a larger animal approaches a smaller animal, the smaller animal might get afraid often when animal approaches animal and size of animal is greater than size of animal , animal feels fear ( ) when one person helps another, the person being helped thanks the helper often when person helps per-son , then person thanks per-son ( ) if something is not obscured behind another object, it can be seen if object does not cover ob-ject , then someone can see ob-ject . ( ) if someone doesn't know something, and someone else tells them, then they know what it is if person does not know a fact and person tells person the fact, then person learns the fact ( ) as the world recovers from covid , many institutions have been required to define robust facility access policies. these policies can be complicated, often with many branching conditions (e.g. lists of symptoms) and potential actions for a user to complete (e.g. various policy compliant covid tests). automated systems can help guide users through these policies, but the policies must first be formalized. in the following section, we present an application of the skate nli for building domain-specific policy diagrams around access to school facilities. a policy diagram is defined by • compliance states: terminals actions, whether a person returns or quarantines. • intermediate states: states that lead to compliance states or further modify them, e.g. quarantining because a student is symptomatic. • scenarios: observable states that lead to an intermediate state, e.g. a student experienced a cough and a fever. • variables: observable from the world, e.g. a person marked on a questionnaire that they experienced a cough. together these form a flow chart (policy diagram). nodes in the diagram are states, and each type of state (above) is assigned a top-level template to allow a user to define them. compliance and intermediate states can be mapped to a unique frame instance or combination of frames which allows for compositionality (e.g. quarantine for days / quarantine at home). variables are also compositional (e.g. has a persistent cough) and can be inferred by the system using rules or observed directly through an end-user questionnaire. an example of acquiring and applying a policy diagram is shown below. the above is an example entry defining a suggested compliance state given an intermediate state. the state is compositional, specifying a population to quarantine from. these conditionals form rules usable by a reasoning system. figure shows a simplified covid policy for returning to school along with the skate statements used to construct the policy. in this example, quarantining (from school) and returning (to school) have been defined as compliance states. other rules append adjuncts (optional roles) to a state (e.g. duration) when it is evoked to further specify it. thus, we can define conditions that lead to or day quarantines based on whether the student was exposed or symptomatic resp. we also define intermediate states (exposed and symptomatic) to intuitively provide reasons for suggesting a quarantine. these hold given combinations of observable facts, which can be set through a daily questionnaire. in our representation, duration adjuncts on states map to counters which can align to a calendar. thus, we can chart when a student will be able to return to school. the world state (i.e. a specific scenario) is also defined in skate (as shown in the figure) . interestingly, in this example, "mary and bobby were in class.." is interpreted as colocation in skate, which is used to infer contact between the two via background knowledge in the ontology. given a world state (defined in skate), an administrator can query the graph to determine which students are in which compliance states. in the example, the system correctly infers that bobby has days left to quarantine on / , while mary only has days left. figure : an example policy pertaining to school access. queries can be issued against the graph given a world state to determine compliance. here, a query for compliance states reveals two students currently under quarantine natural language knowledge capture has long been a goal in ai, and interest has only grown with the advent of crowd sourcing platforms like amazon's mturk. our approach draws inspiration from and improves upon this research. conceptnet (speer, chin, and havasi ) started as the open mind common sense crowd-sourcing effort (singh et al. ) which solicited natural language common sense statements. while the omcs knowledge acquisition interface could make use of semi-structured templates, their captured knowledge remains as natural language and they do not further decompose an entry into semantic forms. their approach additionally used generated natural language inferences for user feedback. this plays a similar role to our auto-complete feature, though their feedback is presented after the fact rather than as inline guidance. learner (chklovski ) uses cumulative analogies to gather new information from conceptnet like statements (e.g. newspapers have pages) via answerable questions (e.g. do books also have pages?). learner builds on that design by adding templates with slots for a small set of target top-level relations (chklovski ) . they also generate slots to enumerate an entry, however, much like omcs, they do not further refine input text with templates. our approach leverages recent advances in language modeling to generate templates from user text and to provide unstructured guidance. recently (gopinath et al. ) presented a "contextual auto complete" approach for clinical documentation which used a completion mechanism to disambiguate clinical concepts and create annotated notes. in contrast, our completion mechanism (templates and unstruc-tured text) is far broader in scope (interpreting the full text) and depth of representation (compositional frames). while great advances will continue to be made in the field of semantic parsing, it is highly unlikely that any parser will always perform perfectly. as such, even when a natural language application is capable of a desired behavior, lay users face uncertainty and obstruction when their requests are wrongly interpreted. skate is a natural language interface that reduces the mismatch between system ability and lay user expectation by interactively guiding them towards a structured representation. our approach combines frame-based kr with a hybrid semantic parsing approach to construct interpretations with both structured (i.e. template slots) and unstructured (i.e. textual slot fillers) content. a novel aspect of the hybrid parsing approach is its potential to automatically improve with use, since the unsupervised embedding based parser acts as a vehicle to collect training data for the supervised model. furthermore, the use of a neural rule generation model to produce semantically valid auto-completions is a novel and significant feature from a usability standpoint. we have demonstrated the utility of the skate nli in both an open domain task (story understanding) and in a highly specialized domain (building policy diagrams). we plan to host a public endpoint demonstrating skate shortly. many challenges still remain as we integrate skate into an end-user application, e.g., we are exploring ways to allow users to create new frames and/or slots on the fly, when the pre-defined vocabulary is insufficient. unsupervised induction and filling of semantic slots for spoken dialogue systems using frame-semantic parsing learner: a system for acquiring commonsense knowledge by analogy designing interfaces for guided collection of knowledge about everyday objects from volunteers to test machine comprehension automatic labeling of semantic roles fast, structured clinical documentation via contextual autocomplete spindle: open-domain semantic parsing using pre-trained transformers glove: global vectors for word representation language models are unsupervised multitask learners exploring the limits of transfer learning with a unified text-to-text transformer framenet ii: extended theory and practice using semantic roles to improve question answering open mind common sense: knowledge acquisition from the general public conceptnet . : an open multilingual graph of general knowledge new oxford american dictionary, third edition frame-semantic parsing with softmax-margin segmental rnns and a syntactic scaffold key: cord- -rhcvsqtk authors: welch, charles; lahnala, allison; p'erez-rosas, ver'onica; shen, siqi; seraj, sarah; an, larry; resnicow, kenneth; pennebaker, james; mihalcea, rada title: expressive interviewing: a conversational system for coping with covid- date: - - journal: nan doi: nan sha: doc_id: cord_uid: rhcvsqtk the ongoing covid- pandemic has raised concerns for many regarding personal and public health implications, financial security and economic stability. alongside many other unprecedented challenges, there are increasing concerns over social isolation and mental health. we introduce textit{expressive interviewing}--an interview-style conversational system that draws on ideas from motivational interviewing and expressive writing. expressive interviewing seeks to encourage users to express their thoughts and feelings through writing by asking them questions about how covid- has impacted their lives. we present relevant aspects of the system's design and implementation as well as quantitative and qualitative analyses of user interactions with the system. in addition, we conduct a comparative evaluation with a general purpose dialogue system for mental health that shows our system potential in helping users to cope with covid- issues. the covid- pandemic has changed our world in unimaginable ways, dramatically challenging our health system and drastically changing our daily lives. as we learned from recent large-scale analyses that we performed on social media datasets and extensive surveys, many people are currently experiencing increased anxiety, loneliness, depression, concerns for the health of family and themselves, unexpected unemployment, increased child care or homeschooling, and general concern with what the future might look like. research in expressive writing (pennebaker, b) and motivational interviewing (miller and rollnick, ) has shown that even simple interactions where people talk about one particular experience can have significant psychological value. numerous studies have demonstrated their effectiveness in improving peoples mental and physical health (vine et al., ; pennebaker and chung, ; resnicow et al., ) . both expressive writing and motivational interviewing rely on the fundamental idea that by putting emotional upheavals into words, one can start to understand them better and therefore gain a sense of agency and coherence of the thoughts and emotions surrounding their experience. in this paper, we introduce a new interview-style dialogue paradigm called expressive interviewing that unites strategies from expressive writing and motivational interviewing through a system that guides an individual to reflect on, express, and better understand their own thoughts and feelings during the pandemic. by encouraging introspection and selfexpression, the dialogue aims to reduce stress and anxiety. our system is currently online at https://expressiveinterviewing.org and available for anyone to try anonymously. expressive writing. expressive writing is a writing paradigm where people are asked to disclose their emotions and thoughts about significant life upheavals. originally studied in the scope of traumatic experiences (pennebaker and beall, ) , study participants are usually asked to write about an assigned topic for about minutes for one to five consecutive days. later studies expanded to specific experiences such as losing a job (spera et al., ) . expressive writing has been shown to be effective on both physical and mental health measures by multiple meta-analyses (frattaroli, ) , finding its association with drops in physician visits, positive behavioral changes, and longterm mood improvements. no single theory at present explains the cause of its benefits, but it is believed that the process of expressing emotions and constructing a story may play a role for participants in forming a new perspective on their lives (pennebaker and chung, ) . motivational interviewing. motivational interviewing (mi) is a counseling technique designed to help people change a desired behavior by leveraging their own values and interests. the approach accepts that many people looking for a change are ambivalent about doing so as they have reasons to both change and sustain the behavior. therefore, the goal of an mi counselor is to elicit their client's own motivation for changing by asking open questions and reflecting back on the client's statements. mi has been shown to correlate with positive behavior changes in a large variety of client goals, such as weight management (small et al., ) , chronic care intervention (brodie et al., ) , and substance abuse prevention (d'amico et al., ) . dialogue systems. with the development of deep learning techniques, dialogue systems have been applied to a large variety of tasks to meet increasing demands. in recent work, afzal et al. ( ) built a dialogue-based tutoring system to guide learners through varying levels of content granularity to facilitate a better understanding of content. henderson et al. ( ) applied a response retrieval approach in restaurant search and booking to provide and enable the users to ask various questions about a restaurant. ortega et al. ( ) built an open-source dialogue system framework that navigates students through course selection. there are also dialogue system building tools such as google's dialogflow and ibm's watson assistant, which enable numerous dialogue systems for customer service or conversational user interfaces. chatbots for automated counseling. two dialogue systems for automated counseling services available on mobile platforms are wysa and woebot. these chatbots provide cognitive behavioral therapy with the goal of easing anxiety and depression by allowing users to express their thoughts. a study of wysa users over three months showed that more active users had significantly improved symp-toms of depression (inkster et al., ) . another study shows that young students using woebot significantly reduced anxiety levels after two weeks of using the conversational agent (fitzpatrick et al., ) . these findings suggest a promising benefit of automated counseling for the nonclinical population. our system is distinct from wysa and woebot in that it is designed specifically for coping with covid- and allows users to write more topic related free-form responses. it asks open-ended questions and encourages users to introspect, and then provides visualized feedback afterward, whereas the others have a conversational logic mainly based on precoded multiple choice options. our system conducts an interview-style interaction with the users about how the covid- pandemic has been affecting them. the interview consists of several writing prompts in the form of questions about specific issues related to the pandemic. during the interview, the system provides reflective feedback based on the user's answers. after the interaction is concluded, the system presents users with detailed graphical and textual feedback. the system's goal is to encourage users to write as much as possible about themselves, building upon previous findings regarding the psychological value of writing about personal upheavals and the use of reflective listening for behavioral change (pennebaker, b; miller and rollnick, ) . to achieve this, the system guides the interaction by asking four main open-ended questions. then, based on users responses, the system provides feedback and asks additional questions whenever appropriate. in order to provide reflective feedback, the system automatically detects the topics being discussed (e.g., work, family) or emotions being felt (e.g., anger, anxiety), and responds with a reflective prompt that asks the user to elaborate or to answer a related question to explore that concept more deeply. for instance, if the system detects work as a topic of interest, it responds with "how has work changed under covid? what might you be able to do to keep your career moving during these difficult times?" during the formulation of the guiding questions used by our system, we worked closely with our psychology and public health collaborators to identify a set of questions on covid- topics that would motivate individuals to talk about their personal experience with the pandemic. we formulated the following question as the system's conversation starting point: [major issues] what are the major issues in your life right now, especially in the light of the covid outbreak? we also formulated three follow-up questions, which were generated after several refining iterations. the order of these questions is randomized across users of the system. [looking forward] what do you most look forward to doing once the pandemic is over? [advice to others] what advice would you give other people about how to cope with any of the issues you are facing? [grateful] the outbreak has been affecting everyone's life, but people have the amazing ability to find good things even in the most challenging situations. what is something that you have done or experienced recently that you are grateful for? our system's capability for language understanding relies on identifying words belonging to various lexicons. this simple strategy allowed us to quickly develop a platform upon which we intend to implement a more sophisticated language understanding ability in future work. when a user responds to one of the main prompts, the system looks for words belonging to specific topics and word categories. the system examines the user responses to identify dominant word categories or topics and triggers a reflection from a set of appropriate reflections. if none of these types are matched, it responds with a generic reflection. the word categories are derived from the liwc, wordnet-affect and mpqa lexicons (pennebaker et al., ; strapparava et al., ; wiebe et al., we removed an additional question about how people's lives have changed since the outbreak, as well as a question about what people missed the most about their previous lives. a dominant word category is defined as a word type, where the frequency of occurrence is at least % higher than the second highest frequency category for that group. ) and include pronouns (i, we, others), negative emotion (anger, anxiety, and sadness), positive emotion (joy) and positive and negative words. the covid- related topics include finances, health, home, work, family, friends, and politics. most of the topics are covered by the liwc lexicon, with the exception of politics. for this category, we use the politics category from the roget's thesaurus (roget, ) and add a small number of proper nouns covered in recent news (e.g. trump, biden, fauci, sanders). we formulate a set of specific reflections for each word category and topic, which were refined by our psychology and public health collaborators. for instance, if the dominant emotion category is anxiety, the system responds "you mention feelings such as fear and anxiety. what do you think is the best way for people to cope with these feelings?" initially, we also considered reflections for different types of pronouns, but found that they did not steer the dialogue in a meaningful direction. instead, we flag responses with dominant use of impersonal pronouns and lack of references to the self and reflect that fact back to the user and further ask them how they are specifically being affected. we also crafted generic reflections to be applicable to a large number of situations though the system does not understand the content of what the user has said (e.g. "i see. tell me about a time when things were different", and "i hear you. what have you tried in the past that has worked well"). after the interview, the system provides visual and textual feedback based on the user's responses and provides links to resources (i.e., mental health resources) appropriate given their main concerns. the visual feedback consists of four pie charts showing the relative usage of different word categories, including: discussed topics (work, finance, home, health, family, friends and politics), affect (positive, negative), emotions (anger, sadness, fear, anxiety, joy), and pronouns (i, we, other). the textual feedback includes a comparison with others (to normalize the user's reactions) and interpretations of where the user falls within normalized scales. the system also presents a summary of the most and least discussed topics and how they compare to the average user, along with normalized values for meaningfulness, self-reflection, and emotional tone (using a - scale) along with textual descriptors for the shown scale values. these metrics are inspired by previous work on expressive writing and represent the self-reported meaningfulness, usage of self-referring pronouns, and the difference in positive and negative word usage (pennebaker, a) . finally, the system provides relevant resources for further exploration (e.g. for the work topic it lists external links to covid related job resources and safety practices). the system is implemented as a web interface so it is accessible and easy to use. the interface is built with the django platform and jquery and uses python on the backend (django software foundation, ). before the interaction users are asked to report on a - scale: ( ) [life satisfaction] how satisfied they are with their life in general, and ( ) [stress bef ore ] what is their level of stress. the user then proceeds to the conversational interaction with our system. after the interaction, the user is asked again about ( ) ( ) [meaningful] how meaningful their interaction was. once this is submitted, the user can proceed to the feedback page and view details about what they wrote and how their interaction compares to a sample of recent users. the user is finally presented with a list of resources triggered by the topics discussed. we made an effort to make our system appear human-like to make users more comfortable while interacting with it, although this can vary for different individuals. in future work, we hope to explore individual personas and more sophisticated rapport building techniques. we named our dialogue agent 'c.p.', which stands for computer program. this name acknowledges that the user is interacting with a computer, while at the same time it makes the system more human by assigning it a name. when responding to the user, c.p. pauses for a few seconds as if it is thinking and then proceeds to type a response one letter at a time with a low probability of making typos -similarly to how human users would type. after the system was launched (and up to when we conducted this analysis), we had users interact with the system. we analyze these interactions to evaluate system usefulness, user engagement, and reflection effectiveness. system usefulness. we examine the system's ability to help users cope with covid- related issues by analyzing the different ratings provided by users before and after their interaction with c.p. throughout this discussion, we use ∆stress to indicate how the users stress rating differs before and after the interaction: ∆stress = stress af ter -stress bef ore . negative values for ∆stress are therefore an indicator of stress reduction, whereas positive values for ∆stress reflect an increase in stress. we start by measuring the spearman correlation between the different ratings for the interactions with c.p. results are shown in table . the strongest correlation we observe is between the personal and meaningful ratings, suggesting that interactions that are more meaningful appear to feel more personal, or vice versa. we also observe a strong negative correlation between ∆ stress and the meaningfulness of the interaction, suggesting that the interactions that the users found to be meaningful are associated with a reduction in stress. user engagement. we examine user engagement by analyzing the time users spend in the interaction and the number of words they write throughout the session. figure shows histograms of the session lengths in the number of words used by the user and of the session duration in seconds. the rightmost column of table shows spearman correlation coefficients between user ratings and the length and duration of the sessions. we find a significant negative correlation between stress bef ore and stress af ter with session duration and number of words, suggesting an association between user engagement and lower stress. there is also a weak negative correlation between duration of session and reduction in stress (∆stress). we also investigate if there is a relationship between the pre-and post-session ratings and how engaged a user was with each prompt in terms of length of and duration in writing their response. table shows spearman correlation coefficients for these relationships. it appears that life satisfaction has no correlation with the length of any prompt response except a potentially weak negative correlation with length on the major issues prompt (p = . ). a lower rating may relate with having more personal challenges to write about. stress bef ore has a weak negative correlation between the number of words used and the duration spent in the response to looking forward. higher stress may relate to present concerns, which may make one less inclined to spend time thinking and writing about positive aspects of their future than someone with less stress. we presume this could be the case for the grateful prompt, which likewise correlates weakly and negatively with stress bef ore . stress af ter has a negative correlation between duration spent on every prompt response except for the time spent on major issues. this could be a reflection of the fact that those who have a lot to write about major issues in their life also incur high levels of stress. the personal rating shows no correlations with the duration spent on any of responses, except potentially advice to others (p = . ). we do observe weak negative correlations between personal ratings and response lengths on major issues and looking forward, and potentially on grateful (p = . ) and advice to others (p = . ). perhaps if a user writes more, there is a greater expectation for more personal reflections. we discuss engagement related to reflections more deeply in the next section. the meaningful rating shows weak negative correlations with length on major issues, advice to others, and possibly on grateful (p = . ) and looking forward (p = . ). we do not observe a significant correlation with duration on major issues or grateful, but we do observe positive correlations between duration and looking forward and advice to others. users who spend more time thinking about advice they would give others facing their issues may find the interaction more meaningful, and may experience benefits having reflected on their agency in managing their challenges. reflection effectiveness. to investigate the effectiveness of expressive interviewing reflections, we compare the reflections that were triggered for users whose stressed decreased to the reflections that triggered for the users whose stress increased. for each of these user groups, we compute the dominance of each reflection as its proportion of times it was triggered out of all reflections triggered. in figure , we compare the dominance of each reflection across these user groups by dividing the reflection dominance in the decreased-stress group by that of the increased-stress group. importantly, we observe that all emotion reflections and more topic reflections were triggered at a higher rate for users whose stress decreased, whereas more generic reflections were triggered at a higher rate for users whose stress increased. while we do not presume that increased stress was due to generic reflections, the correspondence between emotion and topic reflections with stress reduction aligns with expectations of effective reflections from motivational interviewing-generic reflections and specific reflections resemble simple reflections and complex reflections respectively, as referred to in motivation interviewing. while both types of reflections serve a purpose, complex reflections both communicate an understanding of what the client has said and also contribute an additional layer of understanding or a new interpretation for the user, whereas simple reflections focus on the former (rollnick and allison, ) . in qualitatively analyzing the instances where generic reflections were triggered, we observe that contextual appropriateness seems to be the best indicator of their success (in terms of ability to elicit a deeper thought, feeling, or interpretation) given that the user was invested in the experience. as these generic reflections are selected at random, their contextual appropriateness was inconsistent, illuminating the scenarios in which they are more or less appropriate. for instance, out of the seven times the reflection "interesting to hear that. how does what you say relate to your values?" was triggered for the increased-stress users, one user expanded on their previous message, one expressed confusion about the question, and another copied and pasted the definition of core values as their response. two other instances of this reflection were triggered when a user had expressed negative feelings such as worry and feeling lazy which appeared misplaced, and the last case was triggered by a message that was not readable. out of the thirteen times the same reflection was triggered for the decreased-stress group, one user expressed not hav- figure : the dominance of each reflection triggered for users whose stress decreased divided by each reflection's dominance for users whose stress increased. scores above (red line) correspond to a decrease in stress; score below correspond to an increase in stress. see table for sample reflections, including the generic reflections. ing much to say, another gave one word responses before and after, and all others expanded on their previous message in relation to their values or gave a simple response to indicate a degree that it relates. this reflection appeared more "successful" (based on if the user expanded on their previous message or values) when it was triggered by a message with more neutral to positive sentiment, such as when the user was expressing what they were looking forward to, or when they had several pieces of advice to offer for a friend in their situation, as opposed to one with more negative sentiment like the messages expressing worry or laziness. in instances of other generic reflections, we observed that another issue for appropriateness was whether the reflection matched the user's frame of thought in terms of past, present, or future. for instance, the reflection "i see. tell me about a time when things were different," best matched scenarios when users described thoughts about changes to their daily lives, but not when users described future topics such as what they were looking forward to, nor when they were already describing the past. based on our observations of the reflections in action, we have three main takeaways. first, topic and emotion specific reflections are more associated with the group of users whose stress decreased. these reflections are only triggered if the system determines a dominant topic or emotion, which depends on the effectiveness of its heuristics, as well as the amount of detail and context that a user expresses. this leads to the next takeaway, that the system appears to be more effective when users approach the experience with an intention for expression, or conversely it seems less effective when the intent to not engage and express is explicit. third, the generic reflections were developed with the intent to function in generic contexts, but we learned in practice that some clashed with emotional and situational content or were confusing given the context. as we did observe many, if not more, successful instances of generic reflections, we are able to contrast these contexts to the unsuccessful contexts, and can develop a heuristic for selecting the generic reflections rather than selecting at random, as well as adapt the language of our current generic reflections to be more appropriate for the expressive interviewing setting. to assess the extent to which our expressive interviewing system delivers an engaging user experience, we conduct a comparative study between our system and the conversational mental health app woebot (fitzpatrick et al., ) . we recruited participants and asked them to interact independently with each system to discuss their covid- related concerns. more specifically, we asked them to use each system for - minutes and provide evaluative feedback pre-and post-interaction. to avoid cognitive bias, we randomized the order in which each participant evaluated the systems. in addition, we randomized the order in which the evaluation questions are shown. before interacting with either system, participants rated their life satisfaction and their stress level. after the interaction, participants reported again their stress level and rated several aspects of their interaction with the system, including ease of use, usefulness (in terms of discussing covid- related issues and motivation to write about it), overall experience, and satisfaction using mainly binary scales. for example, the questions "did motivate you to write at length about your thoughts and feelings? yes/no" and "how useful was c.p. to discuss your concerns about covid? useful/not useful" assess whether the system encouraged the user to write about their thoughts and feelings about covid and whether the system provided guidance for it. tables and show the percentage of users that provided positive or high scores (> on a -point scale) for each of these aspects after interacting with both systems. stress bef ore % % stress af ter % % as observed, there are fewer participants reporting high levels of stress after using either system. however, we see a smaller fraction of participants reporting high levels of stress after interacting with expressive interviewing, thus suggesting that our system was more effective in helping participants to reduce their stress levels. overall, participants reported that expressive interviewing was easier to use, more useful to discuss their covid concerns and motivated them to write more than woebot. similarly, users reported a more meaningful interaction and a better overall experience. however, it is important to mention that woebot was not specifically designed for discussing covid- concerns and it is of more general purpose than our system. nonetheless, we believe that this comparison provides evidence that a dialogue system such as expressive interviewing is more effective in helping users cope with covid- issues as compared to a general purpose dialogue system for mental health. we followed the suggestions of previous research on automated mental health counseling and adopted the goals of being respectful of user privacy, following evidence based methods, ensuring user safety, and being transparent in system capabilities (kretzschmar et al., ) . the practices of motivational interviewing and expressive writing have numerous studies supporting their efficacy (miller and rollnick, ; pennebaker and chung, ) . the combination of these methods in an interviewing format has not previously been studied and we intend to continue publishing our findings as the user population expands and becomes more diverse. we will also continue to improve our system and assessment. we have taken efforts to secure user data. we do not ask for identifiers and data is stored anonymously by session id. the website is secured with ssl. data is only accessible to researchers directly involved with our study. our study has been approved by the university of michigan irb. in this paper, we introduced an interview-style dialogue system called expressive interviewing to help people cope with the effects of the covid- pandemic. we provided a detailed description on how the system is designed and implemented. we analyzed a sample of user interactions with our system and conducted qualitative and quantitative analyses on aspects such as system usefulness, user engagement and reflection effectiveness. we also conducted a comparative evaluation study between our system and woebot, a general purpose dialogue system for mental health. our main findings suggest that users benefited from the reflective strategies used by our system and experienced meaningful interactions leading to reduced stress levels. furthermore, our system was judged to be easier to use and more useful than woebot when discussing covid- related concerns. in future work we intend to explore the applicability of the developed system to other healthrelated domains. table : average ratings grouped by order that the prompts appeared. all sessions begin with "major issues." figure : histograms of the number of words of each user message preceding the generic reflections, grouping users whose stressed increased and decreased. figure : histograms of the number of words of each user message after the generic reflections, grouping users whose stressed increased and decreased. figure : top: before and after stress ratings by users whose stress increased after interaction with c.p. middle: before and after stress ratings by users whose stress remained the same after interaction with c.p. bottom: before and after stress ratings by users whose stress decreased after interaction with c.p. the bars are ordered by the magnitude of change (top and bottom), or by the static stress rating (middle). development and deployment of a large-scale dialog-based intelligent tutoring system motivational interviewing to change quality of life for people with chronic heart failure: a randomised controlled trial brief motivational interviewing for teens at risk of substance use consequences: a randomized pilot study in a primary care clinic django software foundation delivering cognitive behavior therapy to young adults with symptoms of depression and anxiety using a fully automated conversational agent (woebot): a randomized controlled trial experimental disclosure and its moderators: a meta-analysis. psychological bulletin polyresponse: a rank-based approach to task-oriented dialogue with application in restaurant search and booking an empathy-driven, conversational artificial intelligence agent (wysa) for digital mental well-being: real-world data evaluation mixed-methods study can your phone be your therapist? young peoples ethical perspectives on the use of fully automated conversational agents (chatbots) in mental health support motivational interviewing: helping people change ad-viser: a dialog system framework for education & research writing about emotional experiences as a therapeutic process confronting a traumatic event: toward an understanding of inhibition and disease expressive writing, emotional upheavals, and health. foundations of health psychology expressive writing: connections to physical and mental health linguistic inquiry and word count: liwc writing about emotional experiences as a therapeutic process efficient allocation of public health and behavior change resources: the "difficulty by motivation" matrix roget's thesaurus of english words and phrases motivational interviewing. the essential handbook of treatment and prevention of alcohol problems pediatric nurse practitioners' assessment and management of childhood overweight/obesity: results from and cohort surveys expressive writing and coping with job loss wordnet affect: an affective extension of wordnet feelings in many words: natural emotion vocabularies as windows on distress and well-being annotating expressions of opinions and emotions in language. language resources and evaluation this material is based in part upon work supported by the precision health initiative at the university of michigan, by the national science foundation (grant # ), and by the john templeton foundation (grant # ). any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the precision health initiative, the national science foundation, or john templeton foundation. key: cord- - gu yab authors: logeswaran, abison; chong, yu jeat; edmunds, matthew r. title: the electronic health record in ophthalmology: usability evaluation tools for health care professionals date: - - journal: ophthalmol ther doi: . /s - - - sha: doc_id: cord_uid: gu yab introduction: the adoption of the electronic health record (ehr) has grown rapidly in ophthalmology. however, despite its potential advantages, its implementation has often led to dissatisfaction amongst health care professionals (hcp). this can be addressed using a user centred design (ucd) which is based on the philosophy that ‘the final product should suit the users, rather than making the users suit the product’. there is often no agreed best practice on the role of hcps in the ucd process. in this paper, we describe practical qualitative methodologies that can be used by hcps in the design, implementation and evaluation of ophthalmology ehrs. methods: a review of current qualitative usability methodologies was conducted by practising ophthalmologists who are also qualified health informaticians. results: we identified several qualitative methodologies that could be used for ehr evaluation. these include: . tools for user centred design: shadowing and autoethnography, semi-structured interviews and questionnaires. . tools for summative testing: card sort and reverse card sort, retrospective think aloud protocol, wireframing, screenshot testing and heat maps. conclusion: high-yield, low-fidelity tools can be used to engage hcps with the process of ophthalmology ehr design, implementation and evaluation. these methods can be used by hcps without the requirement for prior training in usability science, and by clinical centres without significant technical requirements. electronic health records (ehrs) are defined by the international organization for standardization as ''a repository of data in digital form, stored and exchanged securely, and accessible by multiple authorized users'' [ ] . a number of studies have shown that poorly designed ehrs can be associated with patient and health care professional (hcp) dissatisfaction, reduced patient contact time and physician burnout [ ] . some of the issues include the presence of too many screens, options and prompts. the process of entering data into the system can be unintuitive, with clinicians having to adapt working practices to fit into the workflow of existing ehrs [ ] . the impact of covid- has confirmed the necessity and usefulness of structured queries, triage and prioritization; these are elements that can potentially be addressed by well-designed ehrs. this might further drive the usage and adoption of ehrs. ehr vendors in countries such as the usa are obliged to meet certification requirements set by the office of the national coordinator for health information technology in efforts to promote user centred design (ucd). it has been shown that there are significant variations in ucd processes and testing methodologies by vendors [ ] . ucd processes and usability testing methodology reports provided by vendors can be complex, making it difficult for hcps who are not trained in usability science to understand the information. fully developed and implemented ehrs should ideally be continuously and independently evaluated by end users, much like post market surveillance of a pharmaceutical drug or medical device. a systematic review published in showed that the most used usability evaluation tools were surveys or distributed questionnaires among end users [ ] . while surveys are advantageous in determining a user's perceptions about an ehr system, they are poor at identifying specific usability problems that can be used for targeted improvements. ophthalmology is a unique branch of medicine in that it is both a medical and surgical specialty. there is limited published research on usability evaluation of ophthalmology ehrs [ , ] . the aim of our paper is to discuss practical qualitative methods for usability evaluation of ophthalmology ehrs. these methods can be used by hcps without the requirement of prior training in usability science, and by clinical centres without significant technical requirements. this allows for continuous end user engagement with the ehr vendor. a literature search was conducted on pubmed, medline and google scholar using the search terms 'usability testing', 'electronic health records', 'electronic patient records'. manual searches of bibliographies, citations and related articles were also undertaken. eligibility assessment was conducted by yjc and al who are practising ophthalmic surgeons and qualified health informaticians. this article is based on previously conducted studies and does not contain any studies with human participants or animals performed by any of the authors. we identified six different types of methodologies which can be used for the user centred design process and summative testing process, which have been summarized in table . these methodologies were selected on the basis of their ease of use and accessibility to hcps who are not trained in usability science. the authors of this paper are clinicians with formal qualifications in health informatics. we have simplified complex domains of usability science so that these tools and techniques can be understood and used by a wide range of hcps with different educational backgrounds. the first stage in ehr development is ucd. this process puts the needs of the end user at the forefront of ehr design, to ensure that they are adequately reflected in the final ehr system. this can be particularly challenging in a field such as ophthalmology because of the multidisciplinary approach to patient care. for example, a single outpatient episode in an ophthalmic unit might involve the optometrist, orthoptist, ophthalmic photographer, ophthalmic nurse, ophthalmic technician and the ophthalmologist who will all interact with the ehr and have their own unique requirements. effective usability tools are needed to be identify these needs, which are often complex and hard for the end user to communicate. a combination of three tools can be used in the ucd process: ( ) shadowing, ( ) semi-structured interviews and questionnaires. shadowing is a technique where the researcher follows participants in their daily activities over a period of time, with documentation of the user actions by note taking or video recording [ , ] . this provides a unique opportunity for researchers to understand the different terminologies used in a clinical setting, and what information or clinical events are considered critical to different hcps. for example, the orthoptist will require specific tools to document the measurement of eye movements, while the medical ophthalmologist will be reliant on temporal comparisons of photographs and imaging of the eye. the researcher functions as an apprentice with the aim to understand and appreciate the role and requirements of the master [ ] . the method of autoethnography can follow on from shadowing, as the researcher now has a basic understanding of the practical requirements of the end user. autoethnography is a research method used in the field of human-computer interaction, where the researcher becomes a participant to obtain a deeper knowledge of the subjective state of end users [ ] . this is achieved through the human capacity for empathy. for example, the researcher could engage in forms of self-reflection and writing, as though he is the end user himself. there are several limitations to shadowing and autoethnography. firstly, researchers might have varying degrees of access to real-world clinical settings. secondly, it might still be difficult for researchers who are not content experts to appreciate the difficulty and varying complexity of certain clinical tasks. structured interviews or questionnaire surveys entail a list of questions, with little opportunity for respondents to provide suggestions outside of a rigid template. in the field of ophthalmology, there have been several studies looking at the adoption of ehr in the uk and the usa. for example, a cross-sectional study in showed that only fewer than % of ophthalmology units in the uk were using ehr [ ] . in the usa, a cross-sectional study showed that the adoption rate of ehr in ophthalmology was %, with respondents having a more negative perception of ehr productivity outcomes and effect on practice cost compared to previous studies [ ] . national cross-sectional surveys are useful to provide information about the general adoption and perception of ehrs. however, the results of such findings often fail to identify specific usability issues that can be targeted for improvement. national surveys are often only conducted once every few years while end users should ideally be engaged continuously so that iterative improvements can be made. in contrast with structured interviews, semistructured interviews are in-depth interviews where respondents are provided with pre-defined open-ended questions, which are subsequently thematically analysed to generate a comprehensive picture of the collective experience. studies have suggested that five participants could reveal about % of all usability problems, although there are reported benefits in terms of increased sample sizes in usability and utility testing [ ] [ ] [ ] . semi-structured interviews can be easily conducted in individual or clusters of ophthalmology units. for example, an open-ended question like ''what specific information do you need to record during an oculoplastics consultation?'' could reveal information such as the need for templates for eyelid measurements, tear film break-up time, accompanied by anatomical drawings of the eyelids and orbit. there are, several commonly cited limitations to this method. firstly, manual clustering of themes poses a risk that conclusions would be over-reliant on the researcher's ''often unsystematic views about what is significant and important'' [ ] . the response given by respondents might also be influenced by what he or she thinks a particular situation requires [ ] . people might also react differently depending on how they perceive the researchers [ ] . once the end user needs have been ascertained, the system needs to be designed to reflect them and undergo vigorous testing. this is referred to as the summative testing process. participants involved in this process should reflect the end user demographics of the ehr. ideally, these are the same users whose needs were addressed in the ucd process. it is important to highlight that ucd and summative testing are not sequential processes, but rather iterative in nature. constantly redesigning and testing the system to ensure end user needs are addressed is essential to ensure end user satisfaction. there are a number of tools that can be used to conduct summative testing: ( ) card sort and reverse card sort, ( ) retrospective think aloud protocol and ( ) wireframing, screenshot testing and heat maps. card sorting is an effective, cheap and easy way to help understand the expectations of end users about how content should be organized [ ] . this is a common tool in usability science. however, this technique is not often used in the field of usability testing in ehr. in a recent literature review of the relative frequency of use of usability analysis methods in ehr, card sort was only used % of the time [ ] . this is surprising given that card sorting can be done using affordable software. the way card sort works is that a list of relevant topics are first identified. for example, a list of - topics would include things such as primary complaints, current medications, intraocular pressure, visual acuity, driving status, and laboratory blood tests. participants are then asked to group topics together as categories. topics such as primary complaints, ocular history, past medical history, systemic history, family history, driving status and allergies could then be grouped under the category of ''history''. participant agreements about categories provide researchers with information about which items should be grouped together. this can subsequently inform the structure of the ehr. tree testing or reverse card sort is a technique used to evaluate the ease which content can be found and navigated within a software's information architecture [ ] . the 'tree' is the site structure of the ehr which is essentially a simplified structure of the software. this allows for the structure of the ehr to be evaluated in isolation without effects of factors such as visual design or navigation aids. users are provided with tasks and asked to complete them by navigating a collection of cards [each with a category which was created during the initial card sort]. this evaluative approach provides information to the researcher about whether a predetermined hierarchy is a good way to find information. figure provides an example of the user journey based on the clinician inputting a patient's tear break-up time. this provides researchers with a representation of the way end users navigated through the structure of the ehr to accomplish a particular task. there are, several limitations to the card sort methodology [ ] . firstly, this type of study is performed outside the actual ehr system and is stripped from its context. one is able to obtain information about how individuals combine concepts. however, this does not provide information on how effectively users will find relevant information in the final ehr system. secondly, it is difficult to determine the extent to which the wording of topics influences the way subjects group cards. to counter this limitation, participants can be instructed to think of underpinning concepts beyond the words provided. lastly, the card sorting system means that users are not allowed to place one topic into more than one category. in reality, the information landscape of ehrs often allow concepts to reside in multiple places in multiple pages. reverse card sorting can be a useful technique for analysis of navigation issues. however, the method above does not provide researchers with the participants' reasoning when making those particular navigational decisions. another useful method is the retrospective think aloud (rta) protocol. during this process, participants first carry out their tasks silently, while subsequently verbalizing their thoughts in retrospect [ ] . the retrospective verbalization can be supported by adjuncts such as video recording of the process or computer log files [ , ] . the theory behind this is that when verbalization is accompanied by adjuncts, the rta combines both the benefits of working silently and thinking aloud. following on from the identification of end user usability issues, a low-fidelity prototype of an ehr can be created with a technique known as wireframing. wireframe mock-ups are two-dimensional illustrations of a webpage or a software's interface. they do not involve design elements, which allows for quick iterative assembly and testing [ ] . the benefit of wireframes includes it simplicity of use to determine a software's information architecture, and its intended functionality in the interface. wireframes can also be created without the need for coding or programming expertise. it is interesting to note that wireframing was not used by any of the usability studies performed on ehrs [ ] . this could be due to the perceived difficulty of creating a prototype owing to a lack of usability training amongst clinicians. wireframes can be built using simple software from companies such as balsamiq (https://balsamiq.com). another alternative would be to simply manually sketch the architecture of the ehr on blank pieces of paper. one of the limitations of low-fidelity wireframes is the lack of interactivity and functionality of the actual ehr, such as accordion menus, dropdown windows and active links. wireframes also do not take into account the technical elements of existing ehrs. on the other hand, a fully interactive prototype requires significantly more resources in terms of technical input, time and cost. this would be impractical for clinicians unless they have specific training and resources dedicated to usability science. screenshot testing is a usability tool which can be used in conjunction with the low-fidelity wireframe prototypes. chalkmark software developed by optimal workshop is a simple method of conducting screenshot testing [ ] . participants will be asked to complete a series of tasks which would require them to navigate through the wireframes. quantitative information that can be generated includes the proportion of users that got their first click correct, the locations that the participants clicked, and time taken to complete a task on average. results of this user testing method can then be displayed as a visual map of activity, indicating the areas where users clicked most often this paper provides hcps with foundational skills in usability analysis, which are not currently part of the core curriculum in medical schools or specialist training programs. in many countries, no national frameworks exist mandating the use of such tools in ehr design, resulting in variable uptake of these methodologies by the few major ophthalmology ehr vendors [ ] . providing hcps with these tools will enable them to engage in meaningful conversation with commercial ehr vendors, and play an active role in their development. this will improve the accountability of ehr vendors in adopting usability-driven processes, improve ehr design and improve patient and hcp satisfaction [ ] . it is important to appreciate that the usability tools that we described only form one component of the ehr development process. these tools should not be used in isolation but rather in conjunction with other ehr developmental processes such as utility analysis (whether the system provides features needed by the end user) and prototyping. it is, however, beyond the scope of this paper to explore the full details of the ehr development process. the development and refinement of ehrs should be a continuous and iterative process, in which changes at one stage may require evaluation and changes at another stage. end users should be continuously involved and engaged in usability testing of an ehr. this is very much like post marketing safety evaluations of technology and medications used in real-world clinical settings. with these tools that can be deployed in any clinical units away from resource-rich research centres, we hope that clinical information leads can work together with ehr vendors and various stakeholders to continuously improve the usability of ehrs. funding. no funding or sponsorship was received for this study or publication of this article. the rapid service fees were funded by the authors. authorship. all named authors meet the international committee of medical journal editors (icmje) criteria for authorship for this article, take responsibility for the integrity of the work as a whole, and have given their approval for this version to be published. disclosures. abison logeswaran is a topol digital health fellow who is funded by health education england. yu jeat chong and matthew r edmunds have no conflicts of interest to declare. compliance with ethics guidelines. this article is based on previously conducted studies and does not contain any studies with human participants or animals performed by any of the authors. open access. this article is licensed under a creative commons attribution-noncommercial . international license, which permits any non-commercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the creative commons licence, and indicate if changes were made. the images or other third party material in this article are included in the article's creative commons licence, unless indicated otherwise in a credit line to the material. if material is not included in the article's creative commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. to view a copy of this licence, visit http://creativecommons.org/licenses/bync/ . /. health informatics-electronic health record-definition, scope, and context a physician burnout in the electronic health record era: are we ignoring the real cause? physicians' use of electronic medical records: barriers and solutions a framework for evaluating electronic health record vendor user-centered design and usability testing processes an appraisal of published usability evaluations of electronic health records via systematic review time requirements for electronic health record use in an academic ophthalmology center secondary use of electronic health record data for clinical workflow analysis the methods lab: user research for design. london: design for ageing network using video to re-present the user user-centered design: why and how to put users first in software development. insoftware for people empathy and experience in hci distribution and extent of electronic medical record utilisation in eye units across the united kingdom: a cross-sectional study of the current landscape adoption of electronic health records and perceptions of financial and clinical outcomes among ophthalmologists in the united states a pragmatic view of thematic analysis burlington: morgan kaufmann beyond the five-user assumption: benefits of increased sample sizes in usability testing social research methods social research methodology: a critical introduction. london: macmillan international higher education the good research guide: for smallscale social research projects. mcgraw-hill education (uk): maidenhead card sorting: designing usable categories card sorting: a definitive guide gaining user insight: a case study illustrating the card sort technique a comparison of the four prominent user-based methods for evaluating the usability of computer software usability and the web: an overview. network notes, . information technology services, national library of canada. retrieved first-click testing software | optimal workshop key: cord- -hs pfg b authors: song, jinyue; gu, tianbo; feng, xiaotao; ge, yunjie; mohapatra, prasant title: blockchain meets covid- : a framework for contact information sharing and risk notification system date: - - journal: nan doi: nan sha: doc_id: cord_uid: hs pfg b covid- causes a global epidemic infection, which is the most severe infection disaster in human history. in the absence of particular medication and vaccines, tracing and isolating the source of infection is the best option to slow the spread of the virus and reduce infection and death rates among the population. there are three main obstacles in the process of tracing the infection: ) patient's electronic health record is stored in a traditional centralized database that could be stolen and tampered with the infection data, ) the confidential personal identity of the infected user may be revealed to a third party or organization, ) existing infection tracing systems do not trace infections from multiple dimensions. either the system is location-based or individual-based tracing. in this work, we propose a global covid- information sharing system that utilizes the blockchain, smart contract, and bluetooth technologies. the proposed system unifies location-based and bluetooth-based contact tracing services into the blockchain platform, where the automatically executed smart contracts are deployed so that users can get consistent and non-tamperable virus trails. the anonymous functionality provided by the blockchain and bluetooth technology protects the user's identity privacy. with our proposed analysis formula for estimating the probability of infection, users can take measures to protect themselves in advance. we also implement a prototype system to demonstrate the feasibility and effectiveness of our approach. until today, coronavirus has infected more than . million people and caused nearly , deaths [ ] . in particular, the united states has become the country with the largest number of known infections [ ] . every country has its own privacy policy to share information about infected people. it is challenging to share information across the world with reliable privacy protection since different countries have their own privacy policy to share the information of infected people. information sharing in a centralized manner, such as mit, apple, and google who announced their tracking solutions by storing users' personal data in cloud [ ] [ ] , is highly relying on users' trust. once the personal data has been uploading to the cloud, users generally cannot control the potential abuse. if a comprehensive data security solution is missing, the private user data in the cloud may be hacked for any other harmful purposes. as a global organization, the world health organization (who) collaborates with the governments around the world to share information and enhance the management of epidemic prevention. however, who is losing trust from some countries and cannot obtain sufficient support. some other governments may conceal, falsely report, or hinder from reporting epidemic information. these may create a gaping security hole for global epidemic prevention. to this end, there is no way for individuals to share their information and protect their data privacy simultaneously. government departments and organizations may access the health medical data of all people, and go beyond the scope of their responsibility and duty. for example, some government health departments may locate the personal identity of infected patients, and then enforce them in a centralized isolation shelter, resulting in secondary infections, and restricting personal freedom. actually, apple and google will share the infected individuals' information with health authorities [ ] , which means people's personal data privacy and human rights are being violated without their knowledge. currently, there are two separate tracing systems existing: location-based and individual-based contact tracing. locationbased contact tracing always provides a centralized service and records if there are infections in the given locations without the knowledge of infection movement [ ] . the individual-based tracing systems only focus on the person-to-person contact via bluetooth [ ] [ ] , and they do not have a record about where users get infected. who states that the virus could survive on materials surfaces [ ] [ ] , so the virus has an effect on the people daily activity environment. but the individual-based system cannot trace and estimate the covid- effect on the given location. our proposed system combines the locationbased and individual-based systems together for users to access the public area infection status and lookup personal contact tracing history at the same time. our lab already deploys the centralized location-based tracing system [ ] and we may merge our two systems together for a future research. blockchain can natively provide the decentralized service to share the information and protect the privacy of people [ ] . user information can be packaged in transactions and stored in the blockchain in each computing node. even if the data of one node is manipulated, it will not affect the data's consistency because the manipulated data will not pass the verification by other nodes. a smart contract is a program that runs on the blockchain platform; it can execute instructions in a distributed manner, and maintain output consistency. in our system, the smart contract allows the user to check-in at each location and query the blockchain database for the location infection status. even though mit, apple, and google provide virus contact tracking solutions, the tracking service still brings the following new challenges. the infection transmission factors considered by current existing virus tracking service systems are too simple. the virus is not only carried by the infected person but also remains in the environment, so the user will still be infected by the virus attached to the object surface. therefore, our proposed system not only records the user's contact history but also tracks the user's travel trajectory. protecting user data privacy about his personal health information in a public network is also very challenging. traditional methods of hiding users' personal information, such as removing names, addresses, and social security numbers, cannot fully protect user privacy. so we embedded the function of bluetooth randomization address in the contact tracking service of the system. besides, after users uploading health data and identity information, they lose the ownership and the ability to control the data. apple and google provide anonymous services to users, but the infected patients will be reported to the government health department [ ] . we choose the blockchain database as the storage method for the system, combined with the address randomization function of bluetooth, only users can identify and verify their health data and personal information stored in the database. we design and propose a blockchain-based covid- contact information sharing and risk notification system that can be used by the world to take preventive measures against epidemics. this system implements the following main functions with smart contracts and bluetooth embedded: ( ) users can record their visited location information and personal contact history to the blockchain database. ( ) users can update their infection status. ( ) the system can update location infection status. ( ) the system can notify the users that had been exposed the infected people or infected locations before. ( ) the system can estimate the probability of being infected for users based on his location visiting history and personal contacting records. for example, a shopping mall can use our system to detect if this building is infected and the daily infection status. the customers can also query the database to gain the mall infection record before their shopping. the status of an individual is written to the blockchain and cannot be changed. our system is able to protect privacy. we consider a variety of privacy that can be protected by our system. users can regularly change their cellphone bluetooth mac addresses and encode these addresses to virtual ids. then, it is hard to trace individual identity from their public information written in the blockchain. our system will not trace personal identity or report personal health information to authorities. we have the following contributions. • we propose a hierarchical smart contract group in this system, to manage the infection status of each location and transportation. this design reduces the average operation cost and request processing time in the system. • we build and merge location-based and individual person based virus contact tracing and notification system, which tracks the virus infection activity from higher dimensions. • we embed a blockchain database in the system to ensure the safety of users' data and this design avoids the problem that health information may be tampered and stolen in a centralized database. • our system uses a weak random bluetooth address design to generate the user's identity information. this design not only protects the user's privacy but also reduces the congestion of data transmission within the blockchain network. • we propose a mathematical formula for the possibility of user infection, which quantifies the user's contact history and traveling history from multiple dimensions and thus provides a basis for the system's notification service. • we propose an optimal equation for the operating costs of the system, simulate person-to-person contact and user check-in activities in our system, and evaluate the system performance based on the different quantity of users and smart contracts. roadmap: this paper is organized as the following sections: section ii presents the system overview and system components. section iii formulize challenging problems for our system. section iv describes the system design from the perspective of four layers. section v shows how we simulate and evaluate system performance. the last two sections vi and vii present the related work in smart contract tracing and conclusion for this paper. our system can track the locations visited by users and the personal contact history of users with others. when a user updates himself infected, our system will notify other users who have been in direct or indirect contact with him, and provide an assessment of the probability of infection. in a general scenario, the user may visit many public places, such as offices, restaurants, shopping centers, or gyms in one day, so he can use the designed contact tracing service to upload his visiting records including the time and location to our system. then, our system will store the visiting records in decentralized blockchain databases. also, users can check the infection status of a certain location before his visit to ensure safety. when a user reports that his infected status, a smart contract group embedded in our system will update the virus infection status of the locations and transportation that the user visited and took based on his visiting records. our system helps users to track the history of their indirect contact with others because the virus can land on the surfaces of the objects and continue to remain active [ ] , and even if the user does not have face-to-face contact with an infected patient, there is still a possibility of being infected after he touches the surface of the virus-contaminated object. our system uses the bluetooth functionality in users' mobile devices to automatically detect the others and then upload their contact records to decentralized blockchain databases. the range of bluetooth detection is relatively small, which is about to meters [ ] . so, when the user's mobile phone detects the bluetooth signals, it indicates that there are people nearby. when a user reports himself as infected status, our system will broadcast his infection status and alert other users to check if they have close contact with this infected user. it is important to trace the direct contact recorded by bluetooth because viruses can attach to water vapor and spread through the airborne [ ] , and users may be infected by other people in a close range [ ] . therefore, uploading users contact history records will help them to track the infection transmission path of the virus and assess users' probability of being infected. our proposed system provides users with this integrated mobile service, which contains the following functionalities: probability of infection: according to the manual for wto medical staff [ ] [ ] , healthy people could be infected directly and indirectly, where the direct factor is person-toperson contact in a close distance and the indirect factor is that the virus surviving on the surface of the object spread by the patient infects the healthy people after they touch the surface. so the system will evaluate the user's risk from these factors based on the feature data extracted from location tracing and person-to-person contact tracing, like, the length of contact time between users, the spatial distance between users, and the materials of objects in public places. notification of infection: after the estimation of infection probability for this user, the notification function sends a warning alert to him, which reminds him to prepare for virus infection in advance or to seek medical help, before his health condition gets worse. another scenario is when a user reports his infected status, our system will broadcast his virtual identity to other users. then, these users who receive the notification will query the local database to see if he has had any direct or indirect contact with the infected patient and calculate the infection probability. in our system, three main technologies guarantee data security and personal privacy: decentralized database in the blockchain, automatic execution of smart contracts, and randomization of bluetooth mac addresses. (a) data security: our design guarantees that user data will not be manipulated. the blockchain protocol stipulates that the current block will always use the hash value generated from the previous block, so if an attacker manipulates a transaction in a block, he must tamper with all the previous blocks data to ensure his tampering, before the next new block to be mined, verified and accepted by other users. this computing workload is extremely large, so it is almost impossible for an attacker to forcibly manipulate the blockchain data. moreover, users in the network can detect violations of smart contracts, and such attempts to violate the contract are verified to be invalid and those malicious transactions will not be stored in the blockchain. our system contains a smart contract group to handle all user check-in requests for public locations and transportation. since smart contracts have the properties of being unmodifiable, requiring no supervision, and being automatically executed, smart contracts with distributed features ensure that user data cannot be tampered with, or produce malicious results. so, this design ensures the security of user data. decentralized blockchain databases and smart contracts also enhance the usability of our system. since no individual or centralized entity can control all data, there will be no smart contract or system crash due to malicious operations done by some users. (b) identity privacy: we use bluetooth technology to protect user data privacy. the mac addresses uploaded by users are randomly generated by the bluetooth protocol [ ] . although these addresses are stored in the blockchain database, each user's mac address is random and is not fixedly bound to the mobile device. it guarantees that users will not be tracked and located by the addresses provided in the network. from another perspective, bluetooth technology frequently replaces the user's mac address [ ] , which means a user will gradually use multiple different mac addresses in a unit of time. other people in the surroundings have no way to associate the mac address to this user through investigation. the frequent randomized mac address protects the user's privacy in real life. in order to reduce the operating overhead and internet congestion at the system level, we choose weak randomized generation of bluetooth mac address. users upload their random mac addresses to the system, which is measured by ethereum gas [ ] in the system overheads. the larger quantity of random mac addresses, the better privacy protection users could have. but the larger quantity will lead to higher operating costs and network congestion. therefore, we need to find a balance between a sufficiently large quantity of bluetooth random addresses for privacy protection, and a relatively small quantity for lower system operating cost. we highlight four problems: latency, throughput, operating cost, and probability estimation for our proposed covid- information sharing and risk notification system and these problems are described by mathematical formulas at a highlevel. in our system, latency is a time difference ∆t about how long the user's latest check-in request can be processed completely by smart contracts and stored in the databases. we consider that latency is affected by the following five factors: . the number of users |u| in the system, . the frequency freq of users sending requests, . the size of one block |b|, . the height of the smart contract group scg.height, and . the length of the waiting queue |queue| of the smart contract. u includes not only users already exist in the system, but also new users who enter the system within a unit time. the total number of users and the frequency freq of user's check-in determine the total number of user requests per unit time in the system. if the number of user check-in requests exceeds the processing capacity of the smart contract per unit time, the user's requests will enter the waiting queue of the smart contract, which increases the request processing time. we introduce the variable of block size |b|, because it is one of the bottlenecks of the blockchain's development towards high throughput and low latency [ ] . the height of the smart contract group scg.height and the length of the queue |queue| in each contract are also important factors that affect latency. in our proposed hierarchical structure, the smart contracts at the same level will not affect the efficiency of processing requests between each other. the requests received by the smart contracts at the current level are all from the higher-level smart contracts, so the hierarchical structure we propose is a tree structure, whose height is described as lg(|scg|). as mentioned before, the smart contract will put the requests that have not been processed in time in the waiting queue. if the number of unprocessed requests is greater than the length of the queue, these requests will be abandoned. then the smart contract needs to wait for other nodes to synchronize the transaction operations and data. this brings a longer latency. we thus establish the latency formula: where we have a series of transition functions δs provided to determine the latency, latency = φ(δ u,freq , δ |b| , δ scg.height,|queue| ). we propose that the three polynomial δs are combined by function φ. in our system, throughput tp refers to the number of user requests that the system can completely handle in a unit time. it can intuitively show the system's ability to process user requests. the greater the throughput, the stronger the processing capacity of the system. throughput is limited by latency, packet loss rate rate pl and bandwidth bw. as described in the previous section, five factors in the system can affect the latency, but the rate pl and bw depends on the network conditions at the user end. therefore, the throughput can be defined as follows: where θ is the transition function with three arguments to determine the tp. operating cost is another important problem we consider. reasonable operating costs determine that our system can serve users stably, and support its operation for a long time. the operating cost in the system is measured by the ethereum gas [ ] , which includes the following five influencing factors: location-based and bluetooth-based contact tracing services, health tracing service, the setup and operation of smart contracts. we will explain all these services and components in the next section. since the blockchain decentralized structure, users can query their local blockchain database without consuming gas and we do not include the cost of the database in the operating cost calculation. we consider the number of users and requests have a polynomial relationship with the cost of three tracing services. the setup cost of a smart contract is fixed, but its operating costs will increase with the increase of users. so we have the following cost formula: where loc represents location-based contact tracing service, bt represents bluetooth-based contact tracing service, heal represents health tracing service, and op represents all the operations in the smart contracts like adding a user checkin and getting a location infection status. we define λ to be the transition function to calculate the cost combined by the four factors. then, we measure the average and variance values of the operating cost, which represent the system's stability in an optimal condition. here are the formulas when the system has five factors at minimal cost: args var = arg min var(λ(loc, bt, heal, setup, op)) ( ) args avg = arg min avg(λ(loc, bt, heal, setup, op)) ( ) therefore, we can provide the optimal arguments with a minimal system cost, which is shown as: args optimal = mincost(args var , args avg , ζ) where we introduce penalty function ζ to adjust five arguments and get the global optimal sets for the cost calculation. in the previous section, we briefly describe the functionality of estimating the infection probability in the health tracing service. since the ground truth data contains only binary values representing infected or not, without a percentage value for the probability of infection, we introduce the statistical logistic regression analysis method with iteratively reweighted least squares (irls) [ ] to maximum likelihood estimation and find the optimal arguments. we assume that the data set is relatively clean and there are no outstanding outliers. ground truth dataset contains n user data points (x i , y i ), i = , ..., n, where x i is the independent variable tuple with (rssi, ∆t b , ∆t c and ms) and y i is the dependent variable with binary values { , }. we conclude five elements in the tuple affecting the infection probability: • rssi is the received (bluetooth) signal strength index, which represents the distance between users [ ] . • ∆t b is bluetooth contact time interval, which indicates how long users detect each other's bluetooth signal from a short distance. • ∆t c is a overlapping time calculated based on two checkin time points t i and t j , which infers the time when two users enter the location, and ∆t c = |t it j |. • ms is a discrete value in set ms representing a virus residual active time period on a material surface [ ] . so we have: where β is the set of parameters {β , β , ..., β } for these five arguments including the constant. then, we can use the estimation method irls to get the fitted model and arguments: args optimal [w] = irls(x, y, w, µ) where w is arguments tuple (rssi, ∆t b , ∆t c , ms) and µ is the predicated value from formula ( ). in this section, we will define and explain our system components from the perspective of the four layers in the system, and the interactions within the contact tracing mechanisms. our trace and notification system contains four layers for users to trace person-to-person contact via bluetooth, check-in locations, and lookup infection status with other users on the blockchain platform. fig. shows four layers: user interaction layer, mobile service layer, smart contract service layer, and data storage layer. this system provides two primary services for trace and notification: bluetooth-based personal contact trace service and location-based contact trace service. both of the two services are developed on the blockchain platform, and the data generated from these two services is stored in the distributed blockchain databases. the locationbased contact trace is coordinated by the smart contracts in the third layer and the bluetooth-based contact trace is handled in the second layer. (a) user interaction layer at the user interaction layer, we have two entities: user u and location l. users are people who hold bluetooth-enabled mobile phones and have two health statue types: healthy users u normal and infected users u infected . users access to mobile service in the second layer and update their health status in the system based on their medical reports. we assume that users always honestly upload their infection status to our system. a location l is a public place or transportation that users often visit in their daily lives, such as office, restaurant, stadium, bus, and even airplane. location l also has two status types, uninfected location l normal and infected location l infected . if an infected user u infected visited this location, then this location l would be marked as l infected by the system. (b) mobile service layer mobile service layer is the the core handler in our system and it interacts directly with the other three layers. mobile service ms is our proposed mobile phone application in this layer. cooperating with other layers, the mobile service layer is an interface layer for providing users with services, including the following two aspects: contact tracing service based on bluetooth or location, and health tracing service supported by data provided from the blockchain database. contact tracing services include location-based and bluetooth-based, focusing on indirect and direct contact between users. the bluetooth-based service is an integral part of the client's mobile phone service which embeds the bluetooth function. it can sniff other surrounding users' bluetooth devices, and broadcast its own bluetooth random mac address macaddr as a virtual identity. this bluetooth-based service can exchange its user randomized mac address with other users via bluetooth, and then packet received other users' mac addresses macaddrs, time interval timeinterval of the interaction and received signal strength index rssi into the transaction tx, which is broadcast to the blockchain network. in addition, when the user receives other's infection notification from the broadcast transaction tx, our mobile app will automatically query the local blockchain database for the infected user's bluetooth address, check-in information, and check whether the infected user has a direct or indirect contact with current user. the second contact tracing service is the location-based tracing. it accepts the user check-in requests req checkin s from the user interaction layer and it coordinates these requests to different smart contracts in the third layer based on the sender's location. at the same time, the infection status of this location l is affected by the user's check-in request req checkin . if the user u updates the health status as infected u infected and broadcasts it to the smart contract, the location l will be marked as infected l infected by this corresponding smart contract. these two services will be explained step by step in detail and described in mathematical formulas in the next two sections. health tracing service is the third sub-service in mobile service layer. this service has two functions: . broadcast user's infected status u infected to alert other users and update the infected status for the locations ls where the infected user visited. . estimate the probability of being infected for users prob u (infection). users us at the first layer can update their health status through this service at the second layer, packet their health status {u normal , u infected } in a transaction tx and send it to the smart contract responsible for the infection status at the third layer and broadcast it to other users on the network. once the health status of the infected person has been updated as u infected , people who had contact with the infected person will be prompted with a warning alert provided by this health tracing service. after the normal user u normal receives the warning alert, this health tracking service estimates the probability of the user being infected prob u (infection), based on the data collected from location-based and bluetooth-based tracing services contact regarding the infected person u infected . we consider there is a relationship between received (bluetooth) signal strength index rssi and infection probability p (inf ection) with covid- . similar to apple and google's design [ ] , we use the bluetooth sniff to detect if two users are in a close distance d, which can be measured by the received signal strength indication (rssi) of bluetooth [ ] and the author uses low pass filter (lp f ) to reduce the errors in the measurement data. to describe the content of the article without repeating, we only quote the method based on the bluetooth signal strength index. in the following section, we will use the simplified mathematical formulas including lp f by default to reduce errors in measured data as designed. research and experiments made by mit show the covid- can spread by airbone [ ] , which can reach feet away by a sneeze [ ] . so, the closer the user is to the infected person, the easier he is to be exposed to more viruses and become easier to be infected. therefore, we correlate the strength of the bluetooth signal strength with the possibility of being infected by a virus. (c) smart contract service layer the smart contract service layer is the secondary core of our system. the check-in request req checkin generated by the user's visiting location l is processed by the mobile service layer and forwarded to this smart contract service layer, where it is managed by the smart contract group. the smart contract group integrates the contracts according to the administrative hierarchy system. the top level is the smart contract at the state level contract state , followed by the county contract county , then the city contract city , and finally the smallest unit location contract location . location smart contracts contract location belong to the city-level contracts contract city to manage, city-level contracts belong to counties contract county , and county-level contracts belong to states contract state . each contract will only inherit from one superior contract, and will not belong to two different superior contracts at the same time. each location must be in one of these three states: {emptystatus, infectedstatus, cleanstatus}, and the corresponding smart contract contract location dynamically records the infection status of the location l. if an infected user u infected visits this location l, or a user who has visited this location, reports that he is infected, then this location l is considered infected by this user u infected . only after the location is cleaned, or days after being infected, the location l is considered to be in a cleanstatus. in order to save the operating cost of the smart contract contract location and maintain the accuracy of the location l status record, the coming requests reqs trigger the smart contract contract location to check and update the infection status for the location l, otherwise the smart contract contract location will not actively check the infection status of the location l. this design ensures that users can get the latest location status for their requests while avoiding the operations of the smart contract. (d) data storage layer we have deployed a distributed blockchain database db in the data storage layer, and every user and computing node can be synchronized in our network to get a complete database, which is consistent with the data of others. from the perspective of data storage, the traditional centralized database stores all data [ ] in one data center; the traditional distributed database stores data in multiple data centers, but each center may not have global complete data [ ] . from the data management perspective, traditional databases require a central node to process read and write requests to the database, but a blockchain database does not require a central node to process read and write requests [ ] [ ] because all users can have the same database locally, to directly query for consistent results. in our system, the blockchain database will store all transactions in the network, including users' bluetooth contact records, check-in information of the visited locations, and the change of the user's public health status. the three tracing services mentioned above, location-based contact tracing, bluetooth-based contact tracing, and health tracing services, except location-based contact tracing requiring smart contracts to update and store infection status to the database, in the other two services, users can query the blockchain database directly for personal contacts and visiting records. current bitcoin and ethereum blockchain database design have problems such as high computational cost, and slow writing and querying data [ ] , but there are existing thirdgeneration blockchain databases now whose performance, such as throughput, can be comparable to databases of visa credit card companies [ ] [ ] . since the third-generation blockchain database has no mature commercial development and lacks a mature platform to support the development of smart contracts, we still deploy our system on the ethereum platform for simulation and evaluation. in this section, we describe the location-based contact tracing service in detail: first, we show entities participating in the service, such as users, smart contracts, and locations; and then, we introduce the interaction activities between entities, including user's check-in a place while his visiting and the user's query of the infection status of a place before visiting. as defined in previous sections, we have location l, which represents a public place or transportation, users u with normal and infected status and smart contracts, who are in a hierarchy structure to process all users' check-in requests. in figure , we illustrate location-based contact tracing service with a simplified state machine. {q, δ, initialstate, cleanstatus} where we have infectedusercheckin, infecteduserupdate, locationiscleaned, wait days} • initialstate is the null state before the smart contract exists • cleanstatus is the accepting state first, when a user issues a check-in request req checkin through the location-based contact service at the second layer, the system checks whether there is a smart contract contract location corresponding to this location at the third layer. if the contract does not exist, the smart contract of the city contract city where the location is located will create a smart contract contract location for the location. if the contract already exists, the system goes to the next step to process the user's check-in request. in case the coming user is infected, this location enters the next state infectedstatus, otherwise, it enters the cleanstatus. then, when the location is in a state infectedstatus, either after days or if it can be cleaned and disinfected, this location can transform to cleanstatus. finally, in the state cleanstatus, this location is affected by the request of the next user. if the next user is infected u infected , or his past visit is updated to be infected, then this location will return to the previous state infectedstatus. otherwise, it remains in the current cleanstatus. fig. is an example of user u checking in a building in location-based contact tracing service. the check-in information about this user visiting the building, like timestamp t, geographic position geopos and health status {u normal , u infected }, is packaged in a transaction tx with the help of our proposed mobile client app in the second layer. then, the app sends the transaction tx to the smart contract group in the third layer. according to the address of the building geopos, this transaction tx is passed from the state-level contract state to the county-level contract county , to the city-level contract city , and finally transferred to the smart contract contract location corresponding to this building. then, based on the health status u infected for example, provided by the user, the location smart contract changes the infection status to l infected for the building. in this process, the transaction tx that records the user's check-in information will be saved by the blockchain database db. also, the location-based contact tracing service can help the user to get the infection status record of the location l. we need to discuss two scenarios based on whether the user has the contract location network address locally corresponding to the building for example. if the user has checked in this location before or has previously queried the smart contract of this location, the corresponding contract location address should exist in his mobile app. therefore, the user can directly send a request to the contract location network address through the mobile app to get the location infection status. if the user is interacting with this location for the first time, then he needs to get the contract network address first. with the provided the geographic position geopos of this location, the user's mobile app will query the statelevel contract state in the smart contract group, who will transfer this query request from the top level to the city-level contract city based on the provided geopos. if there is a corresponding contract location for this location, the network address of this contract will be returned to the user. otherwise, the city-level smart contract contract city will create a new contract location , and its address will be returned to the user. after receiving the location infection records from smart contract service layer, the request sender can verify the response by querying his local blockchain database for the transaction record. if the infection exists in this location l infected , our proposed mobile app will alert the user. to encourage users to use mobile services more frequently, we have developed a check-in and query incentive mechanism for users. whenever a user checks in a location or queries the corresponding location smart contract, this smart contract will return a slightly more amount of transaction fee to encourage the user to check-in or query more often. the additional fee given to the user is to support the user to use mobile services to broadcast his transaction that contains the bluetooth contact data, check-in information, and update health status. bluetooth-based contact tracing involves all entities except the smart contract group and all layers except smart contract service layer. the mobile app on the second layer packs the bluetooth contact data in the transactions, broadcasts them on the blockchain network, and saves all of them in the blockchain database. similarly, the mobile app processes realtime transactions received about the senders' health status, matches user contact records and reminds users of the danger of contact with infected persons. in the transaction, we will record four elements: a period of users detecting each other ∆t b , the detected mac addresses macaddress, mobile phone model appleinc for example, and the received signal strength index rssi. fig. overviews the workflow of bluetooth-based contact tracing. it helps users from the first layer to exchange randomized mac addresses with each other, and then packet the data with a timestamp and received (bluetooth) signal strength index rssi in the transaction, which is sent to the blockchain database in the data storage layer. the signal strength of different bluetooth devices will be different, so we assume that users are using the same type of apple mobile phone, which provides assumption conditions for the next section to calculate the distance between two users through signal strength. except that, this bluetooth-based service will alert users when receiving transactions containing infected health status from another user. fig. shows that the mobile app in layer detects surrounding users through bluetooth, recording the time interval ∆t b of direct contact of the users, the random mac address macaddr of the other user, the type of mobile device devicetype, and the range of rssi. the mobile app will packet these segments into a transaction to broadcast in this blockchain network and store in the blockchain database. and the mobile app will store all the bluetooth mac addresses generated locally for another service functionality health tracing, which will be discussed in the next section. another scenario of bluetooth-based contact tracing is that, if the mobile app receives a transaction containing other user's infected health status and his mac addresses, the bluetoothbased contact tracing will query its local blockchain database based on these mac addresses, and check whether the current user has a contact record with the infected user. if they have a contact history, then the mobile app will alert the user. then, the mobile app will transfer and broadcast this transaction again on the network to alert other users who may have a contact history. we have mentioned in the previous section that there are two types of challenges: . the balance between the number of random mac addresses generated by bluetooth and protecting users' privacy, . the balance of the number of random mac addresses and network congestion. for the first challenge, we adopt the method of changing the silent period [ ] . the silent period refers to the gap time between bluetooth device discarding the old mac address and adopting the new mac address [ ] , and during this period, the bluetooth device cannot decide to use the new or old address. the article points out that changing the length of the silent period obviously reduces the duration of the bluetooth device being tracked, and thus the device will not be located easily [ ] . this is still an open question, and it is worth our research in the future. for the second challenge, we use weak randomization, which reduces the number of random addresses generated, so fewer transactions are generated for packing these mac addresses. then in the blockchain network, each computing node and the user can achieve data consistency without synchronizing a large number of communication requests. health tracing is the third service in the proposed mobile app and it has two major functionalities: ). broadcast user's infection status to update contracts infection status and alert other users. ). estimate the probability of being infected. within the first functionality, users can update their infection status as u infected or u normal . when the user's status becomes infected, the mobile app will automatically two things: ). update all infection status among contract location s where this user visited in the past days based on his visiting records and ). broadcast transactions containing his infected status and all the bluetooth randomized mac addresses generated in the past days to alert others who have close contacts with him. within the second functionality, users can estimate the probability of being infected based on the analysis of the four factors rssi, ∆t b , ∆t c , ms. referring to the formula ( ) in section , we believe these parameters and p(infection) should be formalized in a logit function: where β (ms) is formulized as after accessing real medical data and investigating the public locations, we will calculate and prove this proposed formula. we build a prototype system and evaluate its performance with the following experiments, where we simulate users' daily contact and check-in activities by the poisson distribution equation. in the experiments, we focus on the average cost of processing requests and the total cost to operate our prototype system. first, we introduce the environment for the experiments. then, we evaluate and analyze the system performance on stability and scalability. in the future, we may deploy our system and build benchmarks to evaluate when the real dataset is available. we conduct experiments on the macbook pro with a macos system in version . . . this machine has an intel i cpu with . ghz and an gb lpddr memory. we use a programming language called solidity to develop and implement the smart contract group, which are deployed on a private ethereum blockchain simulated by a software called ganache. then, we use python to code the script for data analysis. our experiments focus on three basic variables affecting performance: ). the number of |u| increasing from to , with even intervals of , ). the user's contact and check-in frequency freq following poisson distribution, and ). smart contract group size scg.size. based on three quantities of deployed smart contracts and six different numbers of requests increased from to , we measure the average gas cost of all requests, and the standard deviation on the average cost of ten rounds experiments. fig. shows that as the quantity of contracts increases from to , and the number of requests increases from to , the average request gas consumption is reduced by a factor of from , , wei to , wei. fig. shows that with the increase of requests, the deviation of request cost is reduced by times. although the gas amount and deviation in fig. and are very large, the actual overhead is very small. we know that wei is the smallest gas unit in ethereum system and ether is equal to wei. then, assuming ether is worth $ , an , wei request is worth $ − , and the deviation of wei is negligible. also, we find that in the condition of different number of contracts, the average costs trend to be a similar amount around , wei. system overhead refers to the interaction costs between mobile services and users and operating costs for smart contracts, which are measured by ethereum gas. fig. shows that when the number of requests increases from to , and the number of contracts increases from to , the overall gas consumption of the system increases linearly, considering the case of the same amount of requests with different amount of contracts, and the case of the amount of contracts with different amount of requests numbers. based on the three measurements, we believe that this prototype system has good stability and scalability. with the increase in the number of requests and contracts, the request cost, which is the major overhead in the system operation, has a stable approaching to a lower boundary. this trend supports the stability of the system. similarly, the system overhead is linear growth, instead of an exponential one, which is an acceptable performance. in the field of contact tracing, mit [ ] , apple and google [ ] have related products and projects. however, their solutions are either a centralized database that is built in the system or incomplete privacy protection which is provided for users. such designs cannot meet the requirements of user privacy. in terms of data security, the smart contract guarantees the consistency of operating execution and then obtains a consistent output. there is one paper presenting a computing resource trading platform based on a smart contract-based edge computing network [ ] . this design uses a similar treestructured smart contract group with ours, but their implementation focuses on how to match users and complete resource trading. and our purpose of smart contracts is to record the infection status for locations. regarding privacy, there are articles that use differential privacy algorithms in iot data processing [ ] , but differential privacy methods will return a relatively accurate output with noise added. this conflicts with the property of blockchain because when storing data, the latter block must verify the correctness of the data of the previous block and cannot tolerate deviations. but it could be interesting to see intersection research of these two fields. in this paper, we design this tracing and notification system based on blockchain and smart contract, which provides three types of services: location-based contact tracing, bluetoothbased contact tracing and health tracing services. our system can trace the user's travel and contact history, and can remind the user about his past contacts that may cause infection. in addition, users can also estimate the probability of being infected through the health tracing service. in order to protect users' privacy, they can anonymously send their visiting records and health status on the blockchain platform. at the same time, users can use a large number of random mac addresses provided by bluetooth technology as a temporary identity to further protect privacy. in addition, our smart contract group embedded in the system records the infection status of each location and performs the same sequence of check-in operations to ensure that each user can get the consistent infection results of the location. we also simulate the interaction between users and our prototype system, and then evaluate its performance, including gas consumption, operating stability, and request processing speed. in a simulated environment, our system has a good scalability and good stability. we expect to have real data about user contact records to evaluate our system in the future. pact: private automated contact tracing apple and google partner on covid- contact tracing technology covid- coronavirus pandemic cases in the u.s we care world covid tracing laboratory testing for coronavirus disease (covid- ) in suspected human cases: interim guidance protocol for assessment of potential risk factors for coronavirus disease (covid- ) among health workers in a health care setting ethereum: a secure decentralised generalised transaction ledger byzantium version e ec sustainability of coronavirus on different surfaces covid- diagnostic based on mit technology might be tested on patient samples soon a sneeze assessment of risk factors for coronavirus disease (covid- ) in health workers: protocol for a case-control study bleb: bluetooth low energy botnet for large scale individual tracking handoff all your privacy a review of apples bluetooth low energy continuity protocol on scaling decentralized blockchains machine learning: a probabilistic perspective distance estimation of smart device using bluetooth biomodels database: a free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems cassandra: a decentralized structured storage system exploring the attack surface of blockchain: a comprehensive survey how to time-stamp a digital document bigchaindb: a scalable blockchain database operational performance data enhancing wireless location privacy using silent period smart contractbased computing resourcestrading in edge computing a lightweight privacy-preserving data aggregation scheme for fog computing-enhanced iot key: cord- -x xflm authors: frauenstein, edwin donald; flowerday, stephen title: susceptibility to phishing on social network sites: a personality information processing model date: - - journal: comput secur doi: . /j.cose. . sha: doc_id: cord_uid: x xflm today, the traditional approach used to conduct phishing attacks through email and spoofed websites has evolved to include social network sites (snss). this is because phishers are able to use similar methods to entice social network users to click on malicious links masquerading as fake news, controversial videos and other opportunities thought to be attractive or beneficial to the victim. snss are a phisher's “market” as they offer phishers a wide range of targets and take advantage of opportunities that exploit the behavioural vulnerabilities of their users. as such, it is important to further investigate aspects affecting behaviour when users are presented with phishing. based on the literature studied, this research presents a theoretical model to address phishing susceptibility on snss. using data collected from respondents, the study examined the mediating role that information processing plays with regard to user susceptibility to social network phishing based on their personality traits, thereby identifying user characteristics that may be more susceptible than others to phishing on snss. the results from the structural equation modeling (sem) analysis revealed that conscientious users were found to have a negative influence on heuristic processing, and are thus less susceptible to phishing on snss. the study also confirmed that heuristic processing increases susceptibility to phishing, thus supporting prior studies in this area. this research contributes to the information security discipline as it is one of the first to examine the effect of the relationship between the big five personality model and the heuristic-systematic model of information processing. for several years, the anti-phishing working group (apwg) has defined phishing as "a criminal mechanism employing both social engineering and technical subterfuge to steal consumers' personal identity data and financial account credentials" ( apwg ). lastdrager ( ) defines phishing as an "act of deception whereby impersonation is used to obtain information from a target". it is apparent that in the definitions, users are deceived in some way to give out information to the attacker. however, these definitions do not elaborate on the channel or environment in which phishing may be executed, or the attack vector, nor is any mention made of the use of persuasion to make phishing effective. phishing is regarded as a type of internet fraud typically carried out by sending victims an email ostensibly from a legitimate organisation or individual. phishing emails often include hyperlinks that lead victims to spoofed websites; however, they can also include attachments, which users may unknowingly download and install spyware that data-mines the victim's computer for usernames, passwords and credit card information ( harrison et al., ) . the term "phishing" was coined as early as when attempts were made to steal passwords from the accounts of america online (aol) users ( ollmann, ) . several decades later, annual reports released by various information security organisations continue to emphasise the impact phishing has on numerous industries and their customers today ( apwg, ). almost daily, large successful organisations make headlines by falling victim to some form of phishing, resulting in substantial monetary losses. in november , facebook announced that of its . billion monthly active users, approximately million user accounts could be duplicate or fraudulent ( titcomb, ) . earlier the same year, facebook and google were defrauded of more than $ through a phishing scheme that impersonated a large asian-based manufacturer ( united states department of justice, ). phishing is reported to be the fifth most common primary cause of security incidents, is ranked as the main cause of data breaches, and has the highest success rate of any threat vector ( verizon, ) . in the third quarter of , phishing attacks rose to heights not previ-ously seen since late ( apwg, ) . sophisticated phishing attacks have continued to target mobile banking users and to deceive them into submitting bank-related information after having received a phishing email or a sms containing a link on their mobile phones ( choudhary and jain, ) . this has allowed phishers to fraudulently perform a "sim-swap", thereby giving them access to text messages directed to the victim's cell phone number ( samunderu, ) . the phishers are then able to add beneficiaries to the victim's online internet banking profile because the phisher will have access to confirm the two-factor authentication ( fa) code typically sent to the victim's mobile device. furthermore, phishers are creating secure websites, thus foiling efforts to educate users into relying on https as a means of adjudicating whether a website is safe or not ( apwg, ). phishers took advantage of the public's fear and uncertainty over the coronavirus global pandemic. since the start of january , covid- related email phishing attacks saw a steady increase followed by a sudden surge of % by the end of february, according to security firm barracuda networks ( muncaster, ) . covid- -themed phishing attacks were in the form of scams ( %), brand impersonation ( %), blackmail ( %) and business email ( %). in april , google blocked more than million covid- -related emails consisting of malware and phishing and million covid-related daily spam messages ( musil, ) . approaches to mitigating phishing have focused mainly on two control measures: technological controls and user education. mass phishing attacks have to a large extent gone the route of becoming spam, thus relying on the server-side filter technologies to prevent them from reaching users ( ollmann, ) . however, for decades, the information security literature has emphasised that relying on technology alone is insufficient to counter phishing threats today, because such attacks focus to a large extent on exploiting human vulnerabilities rather than technical vulnerabilities. for this reason, many information security scholars mention the phrase "humans are the weakest link" ( yan et al., ) . this has resulted in effort s to make users aware of phishing by improving security awareness, training and education programs ( volkamer et al., ) . however, while it has been shown that security education training campaigns have indeed had an impact on user awareness of security threats, this has not produced the desired results as users who consider themselves to be aware of security threats have not demonstrated actual awareness ( caldwell, ) . furthermore, when faced with phishing, users may be preoccupied with other activities and thus not motivated to consider the security aspects associated with the threat ( moreno-fernández et al., ) . as such, to save time and effort, users may resort to various "cognitive shortcuts" when attempting to make decisions about the authenticity of a message ( vishwanath et al., ) . this brings to the fore information processing as a variable which is considered in the current study. today, the range of communication technologies available has expanded, especially in the mobile device market, to include instant messenger (im) and social applications. phishing is versatile as it is not only carried out in emails and on fake websites, but also in other environments such as in text messages and on social networking sites (snss) ( aleroud and zhou, ; vishwanath, a ) . while users appear to be aware of phishing emails impersonating financial institutions ( frauenstein, ) , many may be unaware of modern social network threats and associated methods ( fire et al., , sophos, . krombholz et al. ( ) point out that users' awareness of se on snss is still comparatively low compared to emails. there is currently no universally accepted term for phishing conducted on snss. as such, phishing on snss is sometimes referred to as social phishing ( jagatic et al., ) , social media phishing ( vishwanath, a ) or social net-work phishing ( frauenstein and flowerday, ) . phishing conducted on snss reaches a far wider audience than the traditional email phishing, consequently affecting both businesses and consumers ( wilcox and bhattacharya, ) . the core se principles employed in traditional phishing emails have also expanded to social network environments ( frauenstein and flowerday, ; tsikerdekis and zeadally, ) . according to vishwanath ( a vishwanath ( , b , social network phishing attacks are multi-staged whereas email phishing is single-staged, although both share common techniques. like phishing emails on snss phishers exploit the technical features offered, creating fake accounts and distributing malicious content ( fire et al., ) . phishers may take advantage of controversial or significant events that garner public interest or trigger emotions by creating clickbait posts on snss that "attract attention and encourage visitors to click on a link to a particular web page" . attackers have been using snss to launch attacks on organisations by using various methods, including phishing, clickjacking, financial abuse, identity theft, impersonation, and physical crime. in addition to this, social network users are exposed to other risks such as cyberbullying, sexting, embarrassing photos, public sharing of locations, and the spread of dangerous pranks and games ( algarni et al., ; branley and covey, ; reznik, ) . facebook users in particular are at more risk to focused attacks such as spear phishing and click jacking ( adewole et al., ; algarni et al., ) . in south africa, the rapidly growing facebook group #imstaying was established to create unity in a country that had experienced political and economic instability. members post uplifting messages to inspire hope in others. however, anecdotal evidence suggests that groups such as these could be lucrative targets for cyber criminals who may conduct malicious se attacks and steal personal or confidential information ( mckane, ) . this might be because social network users who join particular social network groups typically share common interests and exhibit similar behaviours. in the case of #imstaying, as these members are generally like-minded and positively disposed, they may be open to giving out their personal information such as their mobile number on these groups to help others in need, which in turn can be used by social engineers to conduct further attacks. millions of different email addresses can be collected by phishers simply by using the usernames of members of snss ( polakis et al., ) . despite these risks, snss lack effective techniques for predicting, detecting or controlling se attacks ( algarni et al., ) . in addition, there generally appears to be inadequate or outdated laws to deal with the internet identity theft and online impersonation prevalent on snss ( reznik, ) . social network users, on their part, exhibit unsafe behaviours that range from failing to implement privacy controls, clicking on links originating from seemingly trustworthy sources, and not giving enough attention or thought to the content of messages. research has shown that the activities performed on snss reveal specific personality characteristics ( waheed et al., ) and in these contexts, attacks that exploit certain personality types in victims have been found to be successful ( parish et al., ) . specific types of users may be more vulnerable than others to particular forms of persuasion techniques employed by phishers ( lawson et al., ; pattinson et al., ) . on snss, user interaction with content is mainly click-based and information is presented spontaneously with little cognitive effort required on the part of the user. this adds another dimension to research in information security, as users may overlook suspicious messages by resorting to a heuristic approach instead of a systematic approach to processing information. this may be attributed to the design of the underlying software of the sns itself, in that it relaxes users and thus could make them less aware of the potential risks of being deceived ( tsikerdekis and zeadally, ) . this section highlighted the inadequacy of current methods for addressing phishing, especially in the case of snss. phishing programmes and the literature focus on training users to identify phishing emails and spoofed websites, but little attention has been given to social network phishing and other influential factors and their effects on user information processing ( vishwanath, b ) . furthermore, there is a lack of research on the individual differences that lead to susceptibility to online scams ( williams et al., ) . this study shows that the techniques used in phishing emails can also be employed on snss, thus calling attention to the need for future phishing attack definitions and for taxonomies to be redefined. importantly, this study proposes a theoretical model that can help identify the types of user who are more likely to be susceptible to phishing on snss and is an essential step towards improving online security. phishing victimisation is a behavioural problem and, as such, researchers have focused mainly on understanding the factors that influence user behaviour ( wright and marett, ) . various studies report that users' intentions to behave securely may differ from their actual security behaviour ( guo et al., ) . furthermore, it is difficult to predict user behaviour even when users have knowledge and awareness of security threats ( halevi et al., ) , as some users willingly give up sensitive information despite their awareness of these threats ( workman, a ) . users who perceive themselves to be competent in using computers are just as likely to be phished as those who are not ( vishwanath et al., ) . it has also been shown that users' attitudes to risk do not correlate with them being more or less vulnerable to phishing ( halevi et al., ) . in this regard, yan et al. ( ) maintain that identifying ordinary users as the weakest link is too general, and that specific users should be determined through quantitative assessment. moreover, williams et al. ( ) recommend that further research be conducted on the influential factors that affect user behaviour. as such, this study follows this recommendation by identifying particular users susceptible to social network phishing by their personality traits, as this is one of the factors that have recently been found to influence user behaviour ( shropshire et al., ) . moreover, based on personality traits, this identified the processing "mode" users would choose when confronted with phishing on snss. the strength of phishing lies in its use of se techniques to manipulate the victim to carry out actions that are unsafe or to divulge confidential information ( mitnick and simon, ) . this can be effectively achieved by impersonating trustworthy or reputable sources such as a financial institution, government agency or the victim's own employer organisation. phishers also make use of visual cues by replicating corporate logos and slogans of organisations to increase the users' trust in the message ( moreno-fernández et al., ) . the content and arguments in the body of the message can also effectively trigger human emotions (e.g. fear or excitement) and influence cognitive abilities, a ploy that is reinforced by the use of persuasion principles. behavioural vulnerabilities can be exploited through persuasion such as gullibility and optimistic bias ( bullée et al., ) . phishers also take advantage of current and popular events, beliefs, prize offers, religion and politics to obtain a response from the victim. these techniques can influence information processing by the victim, who may not give sufficient attention to validating the authenticity of the message ( vishwanath et al., ) . cialdini ( ) identified six key principles of persuasion, namely, reciprocity, commitment or consistency, social proof or conformity, authority, liking, and scarcity. while these techniques have been used in phishing emails, they can also be employed on snss ( algarni et al., ) . this section demonstrates how effective these principles can be in persuading users to perform certain actions on snss. the images used in each of the persuasion principles are real-world cases personally obtained by the researcher. a message is made to appear helpful and thus the user feels obligated to do something in return; for example to share a message warning others that there is a possibility that their facebook account could be hacked. fig. gives an example of how this technique is used in facebook messenger. fig. is not considered an example of social network phishing but rather of the hoax messages found on facebook. however, such hoax messages could effectively lead to phishing if the user is requested to click on a link with a message stating, "click on the link to see if your profile has been hacked too". users might comply with the instruction especially if the message originates from a trusted friend. facebook responses such as complimenting, commenting on, or liking another user's posts can contribute towards developing a relationship between users, thus encouraging them to accept each other's requests ( algarni et al., ) . the commitment principle refers to the likelihood of dedicating oneself to a cause or idea after having made a promise or agreement to do something ( cialdini, ) . typically, once people have made a choice to commit to something, they will encounter personal and interpersonal pressures to behave consistently with that commitment. according to cialdini ( ) , such pressures will cause people to respond in ways that justify their original decision to commit. further, people will be more confident in their decision to commit especially if they make it known publicly ( ferreira et al., ) . in this regard, snss could be perceived by users as a suitable platform for making their commitment to something known. the authority principle is the most used persuasion technique in phishing ( akbar, ) . messages designed to appear as if they originate from an authoritative or trustworthy entity (e.g. a bank, or from the recipient's employer or a friend) may persuade users to feel obligated to obey or respond to requests. this is because social learning encourages people not to question authority and therefore they are conditioned or may feel obligated to respond ( ferreira et al., ) . on snss, this technique may be effective if the attacker has created an attractive profile or page with fabricated information intended to make it appear legitimate. the fake profile may also have many followers, mutual friends, recent updates and interesting photos, thus increasing the user's trust. alternatively, the attacker could impersonate a public figure, clone a profile or pretend to be someone that the victim may trust ( stajano and wilson, ). an earlier study by jagatic et al. ( ) found that subjects were more likely to respond when the phishing email appeared to have been sent by a friend. in fig. , a well-recognised multinational retail company, is impersonated on facebook with a claim to offer free shopping vouchers. persuasion is further enhanced by creating urgency as the fake shopping voucher is only valid for a limited period. the tendency to imitate the behaviour of members of a group is known as social proof. people will comply with a request if they see others have also complied ( cialdini, ) . for example, a message is shared on facebook by the user's friends and the user in turn shares the post with others in his or her social network. in fig. , the message preys on users' christian beliefs, as the message includes an image of jesus and requests the user to "share a message that incorporates "scarcity" may create a sense of urgency by putting the user under pressure to act. the user may respond in order to avoid missing out on an opportunity, a product, a service, or information, especially if it has limited availability ( bullée et al., ) . urgency can be enhanced by adding a consequence or a timeframe to the message (e.g. a special discount or a prize valid for a certain period), as seen in fig. . in fig. , persuasion is further enhanced by impersonating the internationally known american comedian ellen degeneres, thus taking advantage of the principle of authority . evidently, users responded quickly to the request after which they received further instructions to register their name and to download a movie. in order to become a winner, users were required to click on the shortened url link concealing the site that the user was directed to. it is apparent that the incorrect spelling of ellen's name did not affect the trust of the respondents. people may be persuaded to obey others if they display certain favourable or familiar characteristics ( ferreira et al., ) . snss provide an environment that encourages "liking", as there are built-in features that allow the user to indicate their support for posts by means of a reaction such as "liking" or emotion indicators. people typically like or prefer to be associated with people who are similar to them in terms of personal interests, attitudes, beliefs, opinions, backgrounds, personality types and so on ( bullée et al., ) . for example, a facebook user may receive an invitation to accept a friend request but before accepting the request, he or she may seek information on the sender in relation to the number of friends they have in common, photo albums, occupation and where they live. if there are characteristics that the user likes, they may decide to accept the invitation or comply with a request. if the user agrees strongly with the sender on something important to them, the likelihood of responding increases. phishers take advantage of current affairs, controversial news and events reported in social media, thus preying on users' interests and curiosity . ( krombholz et al., ) note that "curiosity" is a technique overlooked by cialdini ( ) . however, curiosity has been equated with an openness to experience personality trait ( mcelroy et al., ) . fig. shows how the curiosity technique can be employed on facebook messenger. heartfield and loukas ( ) classified this type of an attack as an instant message phishing. in fig. , the effectiveness of this technique is enhanced by visual cues, as the message includes the statement "really" with a shocked emotion icon, as well as an exact image of the victim's profile picture. it also prompts the user's attention and creates urgency as it indicates that hundreds of thousands of users have already viewed the video. although not considered to be part of cialdini's persuasion taxonomy, this technique could use "fear" in order to create urgency ( workman, b ) . interestingly, user training interventions have made use of "fear appeals" as a means to counteract phishing attacks ( jansen and van schaik, , schuetz et al., ) . in these scenarios, if the persuasion principles are used in combination it may influence the way in which the user responds. for example, lawson et al. ( ) found that a combination of authority and scarcity persuasion principles was most likely to arouse suspicion in relation to phishing emails. furthermore, the context in which persuasive techniques are executed can also play a substantial role in the success of a se attack ( bullée et al., ) . as a result, identifying which persuasion techniques users in general are more likely to fall victim is difficult. accordingly, as the phishing literature suggests, it is important to consider users' personality traits as another vulnerability factor ( parish et al., ). personality traits describe individual differences in terms of characteristic thoughts, feelings and behaviours ( funder, ) . personalities are unique to each individual as they are predominantly determined by genetics, social and environmental influences, and experiences ( mccrae and john, ) . personality characteristics are integral to the way humans think and behave, and therefore have an influence on whether or not an individual is likely, be it intentionally or unintentionally, to become involved in malicious activities or risky behaviour ( nurse et al., ) . personality is considered a leading factor in understanding why people behave as they do on the internet ( amichai-hamburger, ) . personality traits are also influenced by gender differences which subsequently affect internet usage behaviour ( amichai-hamburger and ben-artzi, ). prior literature has also investigated personality traits and its influence on social network use ( amichai-hamburger and vinitzky, , correa et al., , moore and mcelroy, , ryan and xenos, . personality traits can also predict the security behaviour intentions of users towards protecting their computer devices ( gratian et al., ) and can also have a significant effect on perceived trust and risk, thus affecting decision making ( cho et al., ) . research involving personality traits has been a topic of interest for a number of decades, with several rating instruments applied in many studies across various disciplines and contexts ( costa and mccrae, ; john and srivastava, ) . scholars, particularly in the psychology domain, continue to explore a variety of focus areas within personality trait research. for example, anxiety and anger, which are among the neuroticism personality traits, are positively associated with risky driving behaviour ( yang et al., ) . in the information security domain, studies that involve personality traits have gained the interest of scholars, as certain traits are considered important predictors of human behaviour ( albladi and weir, ; gratian et al., ) . of the many types of personality scales scholars can adopt, the big five has been noted as the most widely accepted as it shows consistency across time, culture and age groups and is considered more structured as the five traits do not overlap with each other ( erder and pureur, ) . the fivefactor model (ffm), consisting of the "big five" personality traits, is the most widely used and extensively researched model of personality ( john and srivastava, ; mccrae and costa jr, ) . it comprises the five empirically derived factors or dimensions of openness, conscientiousness, extraversion, agreeableness and neuroticism, which are usually represented by the acronym ocean or canoe. now known as the "big five", has resulted from numerous improvements, refinements and iterations which have led to a wide array of personality scales. on the other hand, prior literature has also examined the "dark triad" personality traits, consisting of psychopathy, machiavellianism and narcissism, and its influence on the behaviour of facebook users ( lopes and yu, ) . combining the descriptions of the big five personality traits given by ( zhang, ; john and srivastava, ; rolland, ) , each of the five personality traits are described as follows: openness to experience is the personality trait related to people who are open-minded and seek new experiences, have an active imagination, and focus on intellectual pursuits. they tend to be independent of judgement and have an appreciation for art, nature and different ideas and beliefs. conscientiousness refers to individuals who are honest, trustworthy, neat and hardworking. they have self-discipline, are goaloriented, are prudent and tend to follow the rules, standards and procedures. extraversion is the personality trait attributed to individuals who tend to experience positive emotions such as excitement. they prefer to work with others and tend to be sociable, energetic, talkative, assertive, impulsive and dominant. agreeableness is attributed to individuals who are tolerant, compassionate, modest, polite, cooperative and trusting of others, as they believe that the people they interact with are generally well intentioned and honest. they also value and respect other people's beliefs and conventions. neuroticism is the opposite of emotional stability and is attributed to individuals who tend to experience negative emotions such as pessimism, embarrassment and guilt. such people are generally sad or nervous, and sometimes hot-tempered, and tend to have low self-esteem. pertaining to cialdini's principles of persuasion mentioned earlier, prior research investigated whether certain users, based on their personality type, may be more susceptible to specific persuasion techniques ( gkika et al., ) . others investigated personality traits and the influence persuasion strategies has on users detection of phishing emails ( butavicius et al., ; lawson et al., ; lawson et al., ; oyibo et al., ; uebelacker and quiel, ) . researchers have also focused on exploring the influence of gender and personality traits on phishing susceptibility ( halevi et al., ; mayhorn et al., ; parish et al., ; pattinson et al., ; sumner et al., ) . halevi et al. ( ) examined the relationship between the big five personality traits and email phishing responses, as well as how these traits affect users' privacy behaviour on facebook. their study revealed that % of the respondents had been "phished" and found a correlation between gender and personality traits. for women, a very high correlation to neuroticism was found, while for men no correlation was found to any personality trait, although neuroticism and openness had an inverse correlation to extraversion. halevi et al. ( ) found that the tendency to share information on facebook correlated mainly with openness, while halevi et al. ( ) found conscientiousness to be most at risk to spear phishing. pattinson et al. ( ) investigated the behavioural responses of users when presented with phishing emails and found that those with the personality traits of extraversion and openness were better at detecting phishing emails. however, studies by albladi and weir ( ) and lawson et al. ( ) presented opposing findings as they found that high extraversion increased susceptibility to phishing attacks. furthermore, alseadoon et al. ( ) found that openness, extraversion and agreeableness increase user tendency to comply with phishing email requests. these contradictions were noted by albladi and weir ( ) , who found that conscientiousness, agreeableness and neuroticism significantly decrease the user's susceptibility to phishing on snss. they propose that other factors mediate the involvement of personality traits such as the individual's competence level, motivation to use the services of social networks, trust in social network members and providers, and users' experience of cybercrime ( albladi and weir, ) . although "scepticism" is not regarded as a big five trait, in the cyberworld it would be preferable if users could adopt this trait, as a "trust no one" approach may encourage users to exercise more caution when receiving requests. in the current study, we explored whether information processing could be one of the mediating factors influenced by personality traits. as phishers constantly improve the authenticity of spoofed websites, the visual discrepancies between spoofed websites and their original counterparts are often difficult for users to detect. prior studies have referred to existing theories and have designed models to understand the phenomenon of phishing ( algarni and xu, ) . vishwanath et al. ( ) state that social-psychological research on phishing has identified a lack of cognitive processing as the main reason for individual victimisation. persuasion is one of the key factors that influence information processing in online environments ( guadagno and cialdini, ) . the effectiveness of persuasive communication increases if the message is relevant to the target audience ( petty and cacioppo, ) . as this study also posits that social network users are vulnerable to phishing because they do not process persuasive messages with enough circumspection, theories and models related to information processing were considered. in this context, popular persuasion theories and models include the elaboration likelihood model (elm), the heuristic-systematic model of information processing (hsm) and social judgement theory ( cameron, ) . recently, the hsm has received favourable attention from information security researchers as a suitable theoretical framework for understanding victimisation by phishing ( harrison et al., ; luo et al., ; valecha et al., ; vishwanath, b ; vishwanath et al., ; zhang et al., ). an earlier study by furnell ( ) presented participants with messages and asked them to judge the authenticity of each message. participants subsequently gave insights on the aspects that influenced their choices. according to furnell ( ) , most of the responses could be classified as follows: visual factors (i.e. logos, symbols such as copyright and trademarks, font styles), technical indications (i.e. url in messages, using "https"), and language and content characteristics of the messages (language errors, presence/absence of recipient details, style of the message). furnell ( ) notes that despite these useful insights, participants often arrived at "incorrect conclusions". although furnell's study did not investigate information processing, it highlights that in evaluating the message, users took a heuristic route, focusing more on visual characteristics than on the quality of the argument in the message. ironically, phishers use visual characteristics to their advantage with the aim of enhancing users' trust. as pointed out earlier, persuasion is one of the means by which phishers successfully trick their victims. the hsm is a model that originated from persuasion research in social psychology ( eagly and chaiken, ) and attempts to explain individual information processing and attitude formation in persuasive contexts. dual-process models, such as the elm and the hsm, are the most influential persuasion paradigms ( crano and prislin, ) . both propose two significant approaches to persuasion: the central route (i.e. systematic) and the peripheral route (i.e. heuristic). scholars have used the elm, designed by petty and cacioppo ( ) , to describe how cognitive processing influences deception ( vishwanath et al., ) . the key difference between the two models is that the hsm recognises that the two distinct modes of thinking about information can cooccur, while the elm suggests information processing occurs on a continuum. according to harris and yates ( ) , users evaluate phishing based on two main criteria: the visual quality of the message and the quality of the message argument, of which the latter requires more effort to make a decision. visual quality is concerned with aspects related to source address, company logos, grammar, context and the instruction given in the message (wang, chen, & rao, ) . compared to systematic processing, eagly and chaiken ( ) explain heuristic processing as "a limited mode of information processing that requires less cognitive effort and fewer cognitive resources" (p. ). heuristic processing is focused on simple decision prompts, often termed "rules of thumb", and follows when people lack motivation or cognitive resources. this mode of processing occurs at a shallow or surface level, allowing the receiver to form judgements based on certain factors or indicators such as trustworthiness, appeal and the length of the message ( cameron, ) -all of which are vital se techniques used by phishers. luo et al. ( ) add that heuristic processing takes advantage of the factors mentioned above for the user to conduct a swift validity assessment. in contrast, luo et al. ( ) state that systematic processing takes place when users thoughtfully analyse the content of the message and perform further investigations to validate its authenticity. workman ( a ) states that phishing messages are typically designed to decrease systematic processing. ideally, systematic processing would be the preferred method when users are engaged on snss. however, this type of processing requires more effort, time and cognitive resources. systematic processing not only depends on one's capacity to think critically but also on other factors such as one's existing knowledge, self-efficacy in obtaining relevant information and the perceived usefulness and credibility of available information ( griffin et al., ) . moreover, users may be involved with other information-seeking activities, using different software applications, which distracts them. in this regard, ivaturi et al. ( ) suggest that users may not be in the correct frame of mind when presented with security attacks, thus leaving them vulnerable. moreover, as snss include both asynchronous (i.e. personal messages sent within the sns) and synchronous (i.e. embedded chat functions within the sns) modes of communication ( kuss and griffiths, ) , this too can distract users. taking this into consideration, users may limit systematic processing unless they are motivated to do so ( chen et al., ) . if users consider determining the validity of a phishing message on an sns as being too time-consuming, difficult or unimportant, this may influence them to resort to heuristic processing. human emotions may also interfere with the users' judgement of message content ( workman, ) . moreover, personality traits can also influence these decisions. cho et al. ( ) found that the personality traits of agreeableness and neuroticism can affect decision making, as these traits have a significant influence on whether these users perceive information as either trusted or distrusted. ideally, if users were to systematically process the information they receive, checking it for validity and paying attention to visual cues, there would be fewer phishing victims. in the previous section, the literature discussed the relationships between personality traits and their effect on phishing susceptibility, and in doing so revealed contradictory findings. moreover, a paucity of research was found that investigated the relationship between personality traits and information processing. vishwanath ( b ) states that decades of empirical research have failed to show any relationships between the big five personality traits and information processing. however, prior research has not investigated these aspects in a social network and phishing context. as a result, the formulation of hypotheses for the present study was affected by the following limitations: ) contradictory findings in the literature on the effects personality traits have on phishing susceptibility and ) prior literature has examined personality traits and information processing separately from each other. it is the second of these limitations to which the present study makes a contribution. as a result, hypothesis formulation relied mainly on prior literature that described the characteristics of personality traits, and on literature, albeit contradictory, that examined their influence on users when presented with phishing (outlined in section . ). based on this explanation this study hypothesises the following: ing. • h b: extraversion has a negative influence on systematic processing. • h a: agreeableness has a positive influence on heuristic processing. • h b: agreeableness has a positive influence on systematic processing. • h a: conscientiousness has a negative influence on heuristic processing. • h b: conscientiousness has a positive influence on systematic processing. • h a: neuroticism has a negative influence on heuristic processing. • h b: neuroticism has a positive influence on systematic processing. • h a: openness has a positive influence on heuristic processing. • h b: openness has a negative influence on systematic processing. • h : heuristic processing will increase the likelihood of susceptibility to social network phishing. • h : systematic processing will decrease the likelihood of susceptibility to social network phishing. in summary, the proposed theoretical model in fig. consists of three major components, personality traits, information processing, and phishing susceptibility, with hypothesised associations. the personality traits comprise five latent variables (big five) proposed to each have an influence on information processing. information processing is comprised of heuristic and systematic processing and proposed to have an effect on the likelihood of an individual falling victim to phishing on snss. our sampling frame was a convenience sample drawn from final-year undergraduate students enrolled in various courses at a south african university located across three different sites in the eastern cape province. the total population consisted of final year engineering students. as this study aimed to achieve a % confidence level, a minimum of users were required ( kothari, ) . the choice of students was based on the following reasons. firstly, students are actively engaged on snss ( dixit and prakash, ) . secondly, the choice of final year students, instead of any particular level of student, was based on the notion that they may bring security risks to the organisations that they anticipate working for in the following year. finally, university students are more susceptible to email phishing attacks ( bailey et al., ) . surveymonkey®, an online survey tool, was used to collect primary data. approval was granted by the university where the target sample was located. we managed to collect data from respondents, of which seventy cases had incomplete responses and were removed from the analysis. the final sample consisted of a total of respondents of which were male ( %) and were female ( %). respondents had a mean age of . years ( s.d. = . ). sheng et al. ( ) found participants in this age group were more likely to fall victim to phishing than people of other ages. as a self-reported questionnaire was the sole method used to collect data from the participants in a single-sitting, there is the prospect that the tested relationships among the constructs might be inaccurate caused by the effect of common method variance (cmv) ( podsakoff and organ, ) . cmv is "attributable to the measurement method rather than to the constructs the measures represent" ( podsakoff et al., ) . as such, cmv can potentially lead to incorrect conclusions concerning the reliability and validity of the item scales measures. two main approaches can help overcome cmv-procedural and statistical. the procedural or preventative approach (an ex-ante technique) is the most preferred and is applied early in the research design stage. the statistical approach (an ex-post technique) is conducted in the empirical stage to detect or possibly eliminate cmv ( chang et al., , podsakoff et al., . following the guidelines of podsakoff et al. ( ) , we employed the following procedural strategies in this study to minimise cmv. an initial version of the survey was pilot-tested to establish whether the research instrument could be considered reliable. respondents were instructed to provide feedback relating to any misinterpretations about what the questions expected of them. to ensure that the survey was clear and unambiguous, we included synonyms in parenthesis where necessary, for some of the personality scale items. for example, "can be tense (i.e. nervous, anxious)". to encourage honest responses, the respondents were informed of the purpose of the study, participation was voluntary, that there are no right or wrong answers, and that they could withdraw from the survey at any time. data was collected anonymously and no identifiable personal information was requested from the respondents. for the statistical approach to detect if cmv exists, we performed two tests. first, we conducted the harman's single-factor test by including all the variables in a principal component factor analysis ( podsakoff et al., ) . if the total variance for a single factor is less than %, it suggests cmv to be of no concern. our results show that the largest variance explained by a single factor was . % indicating that none of the emergent factors could explain the majority of the covariance. second, bagozzi et al. ( ) suggested that cmv can have an effect on the discriminant validity of the constructs. as such we examined the correlation matrix (in table ) to determine whether any of the correlations between any pair of constructs exceeded . -this procedure was also performed by pavlou et al. ( ) . as the correlations were below . , this suggests that cmv is unlikely to be a significant issue ( bagozzi et al., ) . the measures and individual items for personality traits and information processing were adopted from prior studies as they had been proven to be statistically reliable. the variables are discussed in further detail. the public domain instrument known as the big five inventory (bfi) scale test by john and srivastava ( ) was used to determine the personality traits. this instrument (see appendix a ) consists of items scored on a five-point likert scale ( = disagree strongly to = agree strongly). the test determines into which of the five personality traits a person's personality predominantly fits. the personality test has been shown to have solid psychometric properties when compared to other even more comprehensive personality tests ( john and srivastava, ) . in the survey instrument, this section consisted of six stimuli/images of a social network phishing-related message (i.e., persuasive message) found on facebook and personally obtained by the researcher. as mentioned earlier, persuasion is increased if the message is relevant to the audience ( petty and cacioppo, ). as the respondents were students and accustomed to engaging on snss using their mobile devices, the stimuli used were screenshots derived from the facebook smartphone apps. none of the screenshots contained spelling errors which the literature recommends as one of the cues that may assist in identifying phishing. as the primary focus of the study was not to determine which persuasion principle is most effective, not all persuasion principles were tested. as reported in table , the screenshots illustrated that a particular action was required from the user (e.g. to click on play). the purpose of including a variety of different phishing cases was to address the respondents' potential bias as they might give more attention to some messages than others based on their interests or prior encounters. heuristic processing was measured by adopting a four-item scale (see appendix b ) used in prior research ( griffin et al., ; vishwanath, b ) . systematic processing was measured using a three-item scale (see appendix b ) adapted from prior research ( griffin et al., ; vishwanath et al., ) . both the heuristic and systematic items were scored on a fivepoint likert scale ( = disagree strongly to = agree strongly). the above-mentioned items in each stimulus were combined, thus consisting of a total of seven items per stimulus. separating items according to whether they were heuristic or systematic could potentially influence respondents to respond in a way that they may consider morally acceptable rather than reflecting their true behaviour. fig. depicts a screenshot of a phishing email personally received by the researcher, which was subsequently used in the survey to test phishing susceptibility. evidently, the email depicted in fig. is designed to appear as if it originated from facebook with the address being up-date@facebookmail.com. it also employs the blue theme typically associated with facebook branding. the purpose of this variable was to test susceptibility to phishing directly and for this purpose a multiple-choice item scale was used (see appendix c ). in the analysis, we employed generalised structural equation model (gsem), taking into account the binary dependent variable which we created on testing phishing susceptibility (coded as = not susceptible; = susceptible). the items: "reply to the email" and "check . * items < . factor loading were dropped; * * item rounded off to . . the attachment because i am interested to know what my friend has to say" were considered to be items related to phishing susceptibility. not susceptible was represented by the items: "immediately delete the email", "ignore the email" and "i do not trust this email". the item "unsure" was considered to be a missing observation as it does not inform the exact position of the respondent's choice. structural equation modeling (sem), also referred to as path analysis, is known for representing causal relations in multivariate data in the behavioural and social sciences disciplines ( mcdonald and ho, ) . sem provides a way to test the relationships among observed and latent variables holistically and allows for theory testing even when experiments are not possible ( savalei and bentler, ) . the statistical software package, stata® , was used in this study to conduct data analysis. likert scales were predominantly used and are typically regarded as observed variables represented graphically by squares or rectangles ( schreiber et al., ) , while unobserved variables are termed latent factors or constructs and are depicted graphically by circles or ovals ( schreiber et al., ) . sem consists of two main parts, the measurement model and structural model ( civelek, ) ; hair jr, hult, ringle, & sarstedt, ) . mcdonald and ho ( state that the latter is a composite of the measurement and path models. the measurement model is a conventional confirmatory factor model that represents a set of observable variables as multiple indicators of a smaller set of latent variables ( mcdonald and ho, ) . in simpler terms, the measurement model pertains to how observed variables relate to unobserved variables. in sem, the measurement model corresponds to confirmatory factor analysis ( civelek, ) . owing to the alpha limitations, it is technically more appropriate for researchers to apply composite reliability (cr) values because this takes into consideration the different outer loadings of the indicator variables ( jr et al., ) ). much like cronbach's alpha, cr values exceeding . as shown in table , are deemed acceptable for reliability ( chin, ) . convergent and discriminant validity are both considered subcategories of construct validity. firstly, the convergent validity of the items was examined by the factor loadings and composite reliability (cr). factor loading exceeding . demonstrated acceptable convergent validity ( civelek, ) . items loading less than . were dropped from the model. secondly, for discriminant validity we used the fornell-larcker criterion by examining the square roots of the average variance extracted (ave) against the correlation coefficients of the la-tent variables ( fornell and larcker, ) . for adequate discriminant validity, the norm is that the square root of each construct's ave should be greater than its highest correlation with any other construct ( jr et al., ) ). table shows the correlation matrices and their discriminant validities. as noted earlier, the structural model is based on the measurement model ( civelek, ) . the goal of path analysis, and more generally of sem, is to determine how well the proposed model, which is a set of specified causal and non-causal relationships among variables, accounts for the observed relationships among these variables. to evaluate the proposed model constructs, the structural model incorporated path analysis, which not only indicated the magnitude of the relationships between the constructs but also whether these relationships are statistically significant. chin et al. ( ) state that researchers should not only indicate whether the relationship between variables is significant or not, but also report the effect size between these variables. this view is shared by bowman ( ) , who adds that all data analyses should report relevant effect size statistics because although p -values may explain statistical significance from the null, they are unable to offer insight into the magnitude of the actual size of an effect. the effect size (f ) informs whether constructs have a substantive impact on one another. in simple terms, effect size assesses the strength of the relationship between the latent variables and therefore helps researchers to assess the overall contribution of a research study ( sullivan and feinn, ) . the guidelines for assessing f are values of . - . , . - . , and . and above, which respectively represent small, medium and large effects of an exogenous latent variable on an endogenous latent variable ( sullivan and feinn, ) . effect size values of less than . indicate that there is no effect. table reports on the path estimates, t -statistics, effect sizes and overall statistical significance. the path diagram illustrated in fig. shows the hypothesised associations and the corresponding beta (ß) values and pvalues. model fit determines the extent to which the proposed model fits the sample data ( schermelleh-engel et al., ) . barrett ( barrett, ) controversially advocates for an outright ban on approximate fit indexes and posits that the chi-square ( χ ) exact fit test is the only applicable test of model fit for sem. the χ test statistic is the only goodness-of-fit measure that has an associated significance test, while all other measures are descriptive ( schermelleh-engel et al., ) . a non-significant χ result at > . threshold is desired to achieve a good fit between the variance and covariance matrix ( barrett, ) . the χ test achieved an acceptable fit: χ = . , df = and p = . . in addition to the χ test, ( kline, ) recommends reporting the following approximate fit indices: the root mean square error of ap- note: the square root of the aves are represented in bold, as appearing down the diagonal * indicates item rounded off to . table path estimates and hypothesis outcomes. hu and bentler ( ) , schermelleh-engel et al. ( ) and hooper et al. ( ) provided a set of acceptable "rules of thumb" thresholds. these thresholds were considered in interpreting the various fit indices for the model. the rmsea determines to what extent the model, with unknown but optimally chosen parameter estimates, would fit the populations' covariance matrix ( hooper et al., ) . a rmsea value of zero indicates the best result ( kline, ) . however, a cut-off value close to . or a strict upper limit of . appears to be the acceptable norm ( hooper et al., ) . the srmr is an absolute fit index and is computed as the square-root of the difference between the residuals of the sample covariance matrix and the hypothesized model. similar to the rmsea, an srmr value of zero indicates perfect fit ( hooper et al., ) and a value as high as . are deemed acceptable fit ( hu and bentler, ) . the cfi, an incremental fit index, assumes that all latent variables are independent of the model and compares the sample covariance matrix with the null model ( hooper et al., ) . a cfi value exceeding . is required in order to ensure that "misspecified" models are not accepted ( hu and bentler, ) . table presents a summary of the approximate fit indexes with its associated threshold values. this section presents the results of the various statistical tests that were discussed in the previous section. most results in table show an acceptable level of composite reliability as all the constructs exceeded . , except for the constructs infop and agreeableness. in table , we examined the correlations between all the pairs of constructs in order to establish the discriminant validity of the constructs. the correlations between all of these pairs were below the recommended threshold value of . ( safa et al., ) suggesting that all constructs are distinct from each other. table presents the results of the hypothesis tests and associated relationships between the five personality traits, heuristic and systematic processing and phishing susceptibility. each path is a hypothesised correlation between variables representing the causal and consequent constructs of a theoretical proposition ( lowry and gaskin, ) . the results of the hypothesis tests presented in table show that some of the personality traits have significant relationships in regard to heuristic and systematic information processing. following the hypothesis tests and outcomes presented in table , the structural model was created. fig. , depicted as a path diagram, presents the theoretical model demonstrating the predictors of phishing susceptibility on snss in terms of personality traits and information processing. the model shows the correlation coefficients and significance of the relationships between the variables. for further insights, direct phishing susceptibility was examined by excluding the influences of personality traits and information processing in the analysis. the data revealed that respondents would fall victim to a phishing email originating from facebook, as . % of the respondents chose the option "check the attachment because i am interested to know what my friend has to say" while . % would delete the email. only . % did not trust the phishing email. the values of the approximate fit indices demonstrated in table support the conclusion that the estimated model provides an acceptable fit with the data. implying inferences can be made from the study findings which is discussed next. the present study revealed that apart from extraversion and conscientiousness (partly), personality traits do indeed have significant relationships with both heuristic and systematic processing, which may lead to phishing susceptibility on snss. it was unexpected that extraversion would be the sole construct to have no statistically significant influence on both heuristic (ß = . , p = . ) and systematic processing (ß = . , p = . ). as such, our results did not support hypotheses a and b. owing to the characteristics that describe the extraversion trait, it was anticipated that respondents who possess this trait would be excited and would act impulsively towards the stimuli, thereby resorting to heuristic processing. this is confirmed by lawson et al. ( ) , who found extraversion to be highly predictive of susceptibility to phishing emails. moreover, it was anticipated that these users would be less likely to apply the cognitive resources aligned with systematic processing. our results could be because the extraversion trait is found to be sensitive to the cultural background of individuals ( rolland, ) . as expected, agreeableness was found to be statistically significant and having a positive influence for both heuristic (ß = . , p = . , small effect) and systematic processing (ß = . , p = . , small effect), thereby supporting h a and h b. as the agreeableness trait describes individuals as tolerant, cooperative, tending to experience emotional concern for others' wellbeing and trusting of others, it was predicted that users might process the stimuli in either mode. while this may be deemed contradictory, it does support the large base of literature that shares these contradictory findings. for example, the study by enos et al. ( ) revealed that people with high agreeableness were better at detecting deception, while conversely modic and lea ( ) found that highly agreeable people are more susceptible to phishing because they are more likely to trust in uncertain situations. alki ¸s and ta ¸s kaya temizel ( ) found that agreeableness is the most susceptible personality trait to persuasion strategies and cusack and adedokun ( ) concluded that users high in agreeableness are likely to be more susceptible to se attacks than others. alseadoon et al. ( ) found agreeableness increased user tendency to comply with phishing email requests while albladi and weir ( ) found that agreeableness significantly decreased susceptibility to phishing. ryan and xenos ( ) found that facebook users are less conscientious than nonusers of the platform. in our study conscientiousness was found to be statistically significant for heuristic processing and had a negative influence (ß = - . , p = . , small effect), thus supporting hypothesis a. as expected, this indicates that an individual with the conscientiousness trait would not process heuristically and thus be less likely to fall victim to social network phishing. this finding supports albladi and weir ( ) and parish et al. ( ) findings. a study by moutafi, furnham, and paltiel ( moutafi et al., ) found consistent evidence that intelligence is strongly negatively correlated with conscientiousness. moutafi et al. ( ) argued that this is caused by fluid intelligence, which is the capacity to think logically and solve problems in novel situations independently of acquired knowledge. this explanation by moutafi et al. ( ) may also explain why users would resort to heuristic processing. by contrast, conscientiousness was not found to be statistically significant for systematic process-ing (ß = . , p = . ), although it had a positive influence. as such, hypothesis b is rejected. neuroticism was found to be statistically significant for both heuristic and systematic processing. however, it was predicted that neuroticism would be negatively correlated with heuristic processing (ß = . , p = . , small effect), although this was not the case in our findings. thus hypothesis a was rejected. as mentioned by parish et al. ( ) , neuroticism has been associated with computer anxiety and as such this may indirectly help protect individuals with the personality trait of neuroticism against cybercrime. our findings revealed that neuroticism is significantly positively related to systematic processing (ß = . , p = . , no effect), thus supporting hypothesis b. this finding supports the studies by sumner et al. ( ) and li et al. ( ) , who found that users high in neuroticism were more concerned for their privacy. it also supports albladi and weir ( ) finding that neuroticism significantly decreased susceptibility to phishing. openness was found to be statistically significant for both heuristic and systematic processing. individuals with the personality trait of openness are intellectually curious and as such it was anticipated that, given the nature of the images depicted in the stimuli, it would have promoted them to process the stimuli heuristically. as expected, openness has a positive relationship with heuristic processing (ß = . , p = . , no effect) thus supporting hypothesis a. this supports studies by halevi et al. ( ) and alseadoon et al. ( ) , who found that openness is closely related to high phishing susceptibility. hypothesis b is thus rejected, as the relationship to systematic processing was expected to be negative (ß = . , p = . , small effect). this might substantiate the findings by pattinson et al. ( ) , who found individuals with the trait of openness were better at detecting phishing emails. in addition, the study by kreitz et al. ( ) showed that in comparison to the other big five traits, individuals with the openness trait are more perceptive as they were able to detect unexpected stimuli in their environment. as expected, heuristic processing had a significant positive effect on increasing susceptibility to phishing (ß = . , p = . , small effect), therefore supporting hypothesis and the results by vishwanath ( b ) . however, the relationship of systematic processing to phishing susceptibility was found not to be statistically significant (ß = - . , p = . , no effect) and as such hypothesis was rejected. although not statistically significant, the data revealed that systematic processing was negatively related to phishing susceptibility, thus decreasing the risk posed by phishing. the overall findings revealed that there are indeed significant relationships between several personality traits and information processing and also that the mode of processing influences the outcome of susceptibility to phishing on snss. however, following the results of the hypothesis tests, the current study has revealed that predicting the mode of information processing a user would take, based on personality traits, had some unforeseen expectations. the results showed certain traits, such as agreeableness, neuroticism and openness, processed information in both modes thus supporting the dual nature of the hsm -both modes can occur simultaneously. ironically, this aspect could explain the contradictory findings found in phishing literature related to the big five personality traits. furthermore, our findings suggests that apart from personality traits, information processing could also be influenced by the context or persuasion technique ( vishwanath et al., ) . this is further explained by mcandrew ( ) , who states that behaviours associated with a particular personality trait can be influenced by specific situations and environments. this is also highlighted by johnson ( ) , who points out that personality traits do not mean that someone's reactions are absolutely consistent; people may react consistently to similar situations but they may also respond differently in the same situations. similarly, cusack and adedokun ( ) are of the view that traits are also influenced by moderating variables such as emotional state, the environment and motivations. as mentioned earlier the big five personality scale, classified by five distinct classes, has been shown to be reliable and consistent across many studies. in contrast, cusack and adedokun ( ) state that the big five taxonomy defines personalities along a continuum rather than in categories or types, thus allowing for different types of behaviour under different circumstances. as such, the current study showed a "snapshot" view of the students' perceptions and behaviour at that particular time. thus, if this survey were to be conducted again in a different environment, it is possible that the results could be slightly different. as a result, this promotes opportunities for other researchers to conduct similar studies or to improve on this study by considering different variables and environments. while our study offers some insights for behavioural researchers, there were several limitations that open possibilities for future research. the convenience sampling method used and the small sample size minimises the generalisability of the findings as the sample, consisting solely of students, was not representative of the general public. furthermore, the stimuli used in the instrument to test information processing originated on the researcher's facebook profile. this creates bias as the stimuli originated from acquaintances connected to the researcher and not to the respondents. to make the experiment more accurate to reflect its intended environment, researchers could create a facebook profile (i.e. a dummy account), and survey participants could add this profile by responding to a friend request. respondents could then comment on how they would respond to stimuli appearing on their timeline which originates from the researcher's dummy profile. however, this would have to be carefully designed in adherence to ethical guidelines and practices. the study assumed that respondents would address the section on information processing in the survey using the same amount of time and attention to detail as they would in an online social network environment. as the instrument was a survey, the measures used were indirect and consequently could not measure response time, which could be particularly important with regard to information processing. although the survey instrument consisted of several persuasive stimuli, respondents only had to deal with one post at a time. in the sns environment, users would be exposed to a larger set of posts at one time appearing on their timeline. furthermore, the instrument has an option "i ignored the message content", however this did not take into account users "cognitively" ignoring stimuli. as a result, it is possible that users could dismiss posts, thus making no decision, without applying any mode of processing that may pass through their timeline. this study did not aim to identify which specific persuasion strategy is more susceptible to phishing, as has been done previously in other studies. as a result, the current instrument design did not test responses for each of the six persuasion principles. however, as noted by lawson et al. ( ) , phishers tend to use a combination of persuasion types and thus, in such cases, the instrument in its current form cannot determine which specific persuasion principle a user is more likely to fall victim to. furthermore, measuring certain principles such as "liking" makes it difficult to draw conclusions as individuals each have their own set of preferences. also, it was not possible to assess the effect of personality traits on information processing in respect of the persuasion principles of "commitment" and "reciprocity", as this would require prior knowledge of the respondents' past choices and commitments. these specific principles were also identified by butavicius et al. ( ) as being less suited to their laboratory study. the study has implications for organisations as they could develop a similar instrument to identify their employees at risk to phishing. organisations could use the personality test together with an assessment tool that examines employees' preferences; for example, to determine their interest in free prizes, movie genres, employment opportunities, financial stability and the like. these preferences could identify potential behavioural vulnerabilities that phishers could use to persuade victims on both email and snss. following the identification of vulnerable employees, organisations could classify these users accordingly and design security awareness programmes orientated to addressing employees' personal sets of vulnerabilities with consideration to their personality traits. the current study also has implications for researchers. research in personality traits and information processing and their influence on phishing susceptibility has the potential to grow further. the model could be extended to include other variables such as perceived risk, self-efficacy, knowledge, social norms, culture and the like, which could potentially offer further insights into phishing susceptibility. moreover, there is a lack of studies investigating the influence of personality traits on habit. this was pointed out by wood ( ) , who stated that "habit" is largely missing from modern social and personality psychology. a study by vishwanath ( b ) concluded that habits and information processing jointly influence phishing susceptibility. similarly, frauenstein and flowerday ( ) posited that the habitual behaviours exhibited by social network users could influence them to not process phishing messages on snss with sufficient consideration, thus becoming vulnerable to social network phishing. the threat of phishing continues to pose a problem for both organisations and consumers. protection against phishing threats has limitations when relying solely on technical controls. phishers will take advantage of new events, catastrophes and global headlines when designing persuasive messages, thus making it difficult to predict what user education should address. people may serve as a protective measure but only if they "recognise" the threat. however, owing to the individual behavioural vulnerabilities that characterise each user, any security awareness effort s may be ineffective when users are faced with phishing. thus, any steps taken to protect users should also include understanding the individual characteristics that may consequently influence user behaviour and make them vulnerable. in addition, the popularity of snss create new opportunities for phishers to exploit the behavioural vulnerabilities of its users. prior literature has indicated that the personality traits of an individual influence susceptibility to phishing and that the mode of information processing can influence susceptibility to phishing. the current study makes a contribution by bringing together these two distinct areas of research to better understand their relationship to phishing susceptibility on snss. this study proposed a theoretical model that can help identify the types of user who are more likely to be susceptible to phishing on snss and is an essential step towards improving online security. prior literature has highlighted that there are inconsistent findings with regard to personality type and its direct relationship on phishing susceptibility. similarly, our study revealed that the big five traits of agreeableness, neuroticism and openness had a positive influence to both heuristic and systematic processing. conscientiousness was found to have a negative influence on heuristic processing. it is therefore expected that if conscientious people are faced with phishing on snss, they are more likely to closely inspect it before resorting to heuristic processing. extraversion was the only trait found to have no statistical significance on both modes of processing in the study. the study also confirmed that heuristic processing significantly increases susceptibility to phishing on snss, thus supporting prior studies in this area. this article has not been published or accepted for publication and it is not under consideration at any other outlet. to our knowledge, we have no known conflicts of interest with this work. items measured ( = disagree strongly - = agree strongly) item no: description extraversion is talkative is reserved (r) is full of energy generates a lot of enthusiasm tends to be quiet (r) has an assertive (i.e. confident) personality is sometimes shy, inhibited (r) is outgoing, sociable agreeableness tends to find fault with others (r) is helpful and unselfish with others starts quarrels (i.e. arguments) with others (r) has a forgiving nature is generally trusting can be cold and aloof (i.e. distant) (r) is considerate and kind to almost everyone is sometimes rude to others (r) likes to cooperate with others conscientiousness does a thorough job can be somewhat careless (r) is a reliable worker tends to be disorganized (r) tends to be lazy (r) perseveres until the task is finished does things efficiently makes plans and follows through with them is easily distracted (r) neuroticism is depressed, blue is relaxed, handles stress well (r) can be tense (i.e. nervous, anxious) worries a lot is emotionally stable, not easily upset (r) can be moody remains calm in tense situations (r) gets nervous easily openness is original, comes up with new ideas is curious about many different things is ingenious (i.e. clever), a deep thinker has an active imagination is inventive values artistic (i.e. beauty), aesthetic experiences prefers work that is routine (i.e. procedure) (r) likes to reflect, play with ideas has few artistic interests (r) is sophisticated in art, music, or literature (r) = denotes reverse scaled items. malicious accounts: dark of the social networks analysing persuasion principles in phishing emails personality traits and cyber-attack victimisation: multiple mediation analysis phishing environments, techniques, and countermeasures social engineering in social networking sites: phase-based and source-based models an empirical study on the susceptibility to social engineering in social networking sites: the case of facebook social engineering in social networking sites: how good becomes evil the impact of individual differences on influence strategies what is the influence of users' characteristics on their ability to detect phishing emails? internet and personality . the relationship between extraversion and neuroticism and the different uses of the internet social network use and personality. comput assessing construct validity in organizational research analysis of student vulnerabilities to phishing structural equation modeling: adjudging model fit the importance of effect size reporting in communication research reports risky behavior via social media: the role of reasoned and social reactive pathways the persuasion and security awareness experiment: reducing the success of social engineering attacks breaching the human firewall: social engineering in phishing and spear-phishing emails making security awareness training work a practitioner's guide to persuasion: an overview of selected persuasion theories, models and frameworks from the editors: common method variance in international business research motivated heuristic and systematic processing misleading online content: recognizing clickbait as "false news the partial least squares approach to structural equation modeling a partial least squares latent variable modeling approach for measuring interaction effects: results from a monte carlo simulation study and an electronic-mail emotion/adoption study effect of personality traits on trust and risk to phishing vulnerability: modeling and analysis comparative analysis of mobile phishing detection and prevention approaches influence: the psychology of persuasion essentials of structural equation modeling, . zea e-books re who interacts on the web?: the intersection of users' personality and social media use the revised neo personality inventory (neo-pi-r) attitudes and persuasion the impact of personality traits on user's susceptibility to social engineering attacks intentions to use social networking sites (sns) using technology acceptance model (tam): an empirical study the psychology of attitudes. harcourt brace jovanovich college publishers personality factors in human deception detection: comparing human to machine performance chapter -role of the architect principles of persuasion in social engineering and their use in phishing online social networks: threats and solutions evaluating structural equation models with unobservable variables and measurement error an investigation into students responses to various phishing emails and other phishing-related behaviours social network phishing: becoming habituated to clicks and ignorant to threats? personality phishing: can we spot the signs investigating the role of personality traits and influence strategies on the persuasive effect of personalized recommendations correlating human traits and cyber security behavior intentions linking the heuristicsystematic model and depth of processing online persuasion and compliance: social influence on the internet and beyond understanding nonmalicious security violations in the workplace: a composite behavior model primer on partial least squares structural equation modeling (pls-sem) a pilot study of cyber security and privacy related behavior and personality traits spear-phishing in the wild: a real-world study of personality, phishing self-efficacy and vulnerability to spear-phishing attacks phishing attacks over time: a longitudinal study a user-centered approach to phishing susceptibility: the role of a suspicious personality in protecting against phishing examining the impact of presence on individual phishing victimization a taxonomy of attacks and a survey of defence mechanisms for semantic social engineering attacks structural equation modeling: guidelines for determining model fit. electron cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives effect of frame of mind on users' deception detection attitudes and behaviours. paper presented at the international conference on information resources management social phishing. commun persuading end users to act cautiously online: a fear appeals study on phishing the big five trait taxonomy: history, measurement, and theoretical perspectives units of analysis for the description and explanation of personality principles and practice of structural equation modeling research methodology: methods and techniques. new age international some see it, some don't: exploring the relation between inattentional blindness and personality factors advanced social engineering attacks online social networking and addiction: a review of the psychological literature achieving a consensual definition of phishing based on a systematic review of the literature baiting the hook: exploring the interaction of personality and persuasion tactics in email phishing attacks interaction of personality and persuasion tactics in email phishing attacks exploring how personality affects privacy control behavior on social networking sites who do you troll and why: an investigation into the relationship between the dark triad personalities and online trolling behaviours towards popular and less popular facebook profiles partial least squares (pls) structural equation modeling (sem) for building and testing behavioral causal theory: when to choose it and how to use it investigating phishing victimization with the heuristic-systematic model: a theoretical framework and an exploration assessing individual differences in a phishing detection task when do personality traits predict behavior? personality can predict behavior, but only when we understand its limitations. psychol. today a five-factor theory of personality an introduction to the five-factor model and its applications principles and practice in reporting structural equation analyses dispositional factors in internet use: personality versus cognitive style #imstaying is a playground for criminals. mybroadband the art of deception: controlling the human element of security how neurotic are scam victims, really? the big five and internet scams the influence of personality on facebook usage, wall postings, and regret fishing for phishers. improving internet users' sensitivity to visual deception cues to prevent electronic fraud why is conscientiousness negatively correlated with intelligence #covid drives phishing emails up % in under a month. infosecurity magazine google blocking m malicious coronavirus emails every day. cnet understanding insider threat: a framework for characterising attacks the phishing guide: understanding & preventing phishing attacks investigation of the influence of personality traits on cialdini's persuasive strategies a personality based model for determining susceptibility to phishing attacks understanding and mitigating uncertainty in online exchange relationships: a principal-agent perspective the elaboration likelihood model of persuasion central and peripheral routes to attitude change. in: communication and persuasion common method biases in behavioral research: a critical review of the literature and recommended remedies self-reports in organizational research: problems and prospects using social networks to harvest email addresses identity theft on social networking sites: developing issues of internet impersonation the cross-cultural generalizability of the five factor model of personality who uses facebook? an investigation into the relationship between the big five, shyness, narcissism, loneliness, and facebook usage information security policy compliance model in organizations sim swap fraud : no way out?: financial law the handbook of marketing research: uses, misuses, and future advances evaluating the fit of structural equation models: tests of significance and descriptive goodness-of-fit measures reporting structural equation modeling and confirmatory factor analysis results: a review defending against spear-phishing: motivating users through fear appeal manipulations who falls for phish?: a demographic analysis of phishing susceptibility and effectiveness of interventions personality, attitudes, and intentions: predicting initial adoption of information security behavior understanding scam victims: seven principles for systems security using effect size-or why the p value is not enough determining personality traits & privacy concerns from facebook activity facebook admits up to m users are fake and duplicate accounts. telegraph online deception in social media the social engineering personality framework an exploration of phishing information sharing: a heuristic-systematic approach data breach investigations report (dbir) habitual facebook use and its impact on getting deceived on social media examining the distinct antecedents of e-mail habits and its influence on the outcomes of a phishing attack suspicion, cognition, and automaticity model of phishing susceptibility why do people get phished? testing individual differences in phishing vulnerability within an integrated, information processing model developing and evaluating a five minute phishing awareness video investigation of user behavior on social networking sites countering social engineering through social media: an enterprise security perspective individual differences in susceptibility to online influence: a theoretical review habit in personality and social psychology gaining access with social engineering: an empirical study of the threat wisecrackers: a theory-grounded investigation of phishing and pretext social engineering threats to information security the influence of experiential and dispositional factors in phishing: an empirical investigation of the deceived finding the weakest links in the weakest link: how well do undergraduate students make cybersecurity judgment effects of personality on risky driving behavior and accident involvement for chinese drivers thinking styles and the big five personality traits revisited how could i fall for that? exploring phishing victimization with the heuristic-systematic model appendix b. information processing ( griffin et al., ; vishwanath et al., ) items measured ( = disagree strongly - = agree strongly)