key: cord-1053756-zomr7xgk authors: Ravizza, Alice; Sternini, Federico; Molinari, Filippo; Santoro, Eugenio; Cabitza, Federico title: A Proposal For COVID-19 Applications Enabling Extensive Epidemiological Studies date: 2021-12-31 journal: Procedia Computer Science DOI: 10.1016/j.procs.2021.01.206 sha: f3fa167354d7ce5b5f563f2213878325bc96c30f doc_id: 1053756 cord_uid: zomr7xgk During the next phase of COVID-19 outbreak, mobile applications could be the most used and proposed technical solution for monitoring and tracking, by acquiring data from subgroups of the population. A possible problem could be data fragmentation, which could lead to three harmful effects: i) data could not cover the minimum percentage of the people for monitoring efficacy, ii) it could be heavily biased due to different data collection policies, and iii) the app could not monitor subjects moving across different zones or countries. A common approach could solve these problems, defining requirements for the selection of observed data and technical specifications for the complete interoperability between different solutions. This work aims to integrate the international framework of requirements in order to mitigate the known issues and to suggest a method for clinical data collection that ensures to researchers and public health institution significant and reliable data. First, we propose to identify which data is relevant for COVID-19 monitoring through literature and guidelines review. Then we analysed how the currently available guidelines for COVID-19 monitoring applications drafted by European Union and World Health Organization face the issues listed before. Eventually we proposed the first draft of integration of current guidelines. While the acute and emergency phase of the Coronavirus pandemic is not over yet, the world is trying to prepare to the phase dedicated to the economy restart, still characterized by the virus presence. This phase will be focused on the main challenge of contagion minimization, without the need for extreme solutions like the current lockdown that is affecting a vast number of countries around the world. At the moment of this paper drafting, many tasks force and experts panels are trying to define the correct policies for the post-emergency phase management. For example, in Italy, the government instituted one task force for the organization of the "Phase 2" of the coronavirus emergency, while other actors, like the Politecnico di Torino, are dedicating remarkable resources in the definition of good practices for the limitation of the virus spread without affecting the life of the citizens. In this context, many thoughts are directed toward new technologies that could help to confine the spread of the virus. In particular, mobile applications are widely discussed across different media, introducing discussions, proposals and political/ethical debates. The app choice for contact tracing is the main topic across newspapers and breaking news channels. The themes touched by the newspapers and local media are mainly linked to privacy aspects of data collection performed by the app and to the problems given by the app imposition by the government. While the public opinion is focused on these (not trivial) problems, at least four other issues should be faced: • Localism: The use of different applications in different zones could lead to deviations in the data collection method and policy (minor deviations, since it is assumed that all the data collection campaigns are based on the World Health Organization indications), introducing biases in different zones, thus leading to non-comparable data across different regions and countries. • Effectiveness: At the moment, there is no clear evidence of this approach efficacy, but the use of this application should be as immediate as possible. Which is the correct compromise between urge for introduction and request of efficacy evidence? • Population coverage: different studies [1, 2] suggest that the app should reach 60% of the population to be effective. The use by a high percentage of the population should increase the chance of correct and complete contact tracing. Also, higher coverage population would mean higher significance for epidemiological purposes. In the case of fragmentation of the population percentage among the use of different apps, how can each app be useful? • Cross platform: if a subject has a different app, because he/she comes from another country or because there is the possibility to choose the app that the user prefers, how can the other people trace correctly contact with such person? While the app selection, especially if adopted by a whole country, is a political and economic choice, these issues can be solved with technical solutions. Therefore, we focused on the proposal of a set of technical requirements designed to overcome these problems. The next sections will tackle the phases of the work that lead to the technical requirements definitions, starting from the regulatory framework analysis, analyzed to determine the context of such applications. Then the data bias problem is analyzed, introducing a literature review intended to propose the relevant epidemiological and clinical data that each app should trace and the integration of such evidence with available guidelines and author experience. Eventually, the technical requirements intended to tackle the problems of interoperability, population coverage and efficacy, starting from available literature and guidelines are proposed. Author name / Procedia Computer Science 00 (2019) 000-000 3 Given the nature of these apps, that are designed to collect information about the users and use them, even informing subjects that have been in close contact with a patient positive to the novel coronavirus, the primary regulatory reference, in Europe, is the General Data Protection Regulation (GDPR) [3] . On the other hand, the app intended use determines whether it is a medical device or not. Any functionality that could help the single subject (e.g. any COVID-19 prevention functionality) would configure the application as medical device, thus leading to conformity to the current directive [4] . At the moment of the paper drafting, the 93/42/EEC is still in force and it will still be in force for one year, since the Coronavirus outbreak in Europe led the European Commission and the European Parliament to delay the day of full application of the MDR [5] . The definition of the regulatory context in which the app will be developed should be one of the first step for manufacturers, so that they could act to comply with relevant standards from the first phases of development, which in this case for medical devices are ISO 14971, ISO 62304, ISO 62366 [6] [7] [8] , that can be used as guidelines for the development of the app from the early stage to the preclinical validation of the software [9] . Uncontrolled application development could lead to improper data collection and therefore biases in data collection determined, for example, by loss of certain data or improper frequency of monitoring. Biased data collection can lead to a not consistent application of guidelines for outbreaks identification and can lead to impossibility of comparison of data collected from different regions. While this could seem a minor problem because it does not affect the main functionality of the app (i.e. contact tracing) can impair the subsequent use of the collected data. Applications are intended to control citizen contacts and are designed to allow to describe precisely the contacts of each person without the need of relying on interviews and patient memory. Consequently, if the database is significant, all the data can also be used for the early identification of outbreaks. Furthermore, if data are collected with the correct procedure, and the privacy of the user is ensured during all the process, data could be used, after the proper anonymization procedure, for epidemiological studies. In this context, uniformity of data format and data exchange interoperability are crucial, so that epidemiologists will not be blocked by the time-spending procedure of databases integration, but will have access to numerous databases all compatible with each other, and therefore of easy integration and combination, allowing to broaden the scope of the epidemiological studies. To guarantee the compatibility of databases across regions and countries, the type of data requested and obtained from the patient should be the same. Therefore, here we propose to tackle the problem of data selection suggesting to complete a systematic review of the literature and to define a set of data characterized by mandatory collection. We suggest completing as a first step a systematic literature review intended to understand which are the symptoms, clinical signs, risk factors, and comorbidities associated with the novel coronavirus disease. The results of the first literature review should then be integrated with information collected from guidelines drafted by national and international bodies. At the end of this first phase, the lists of symptoms, clinical signs, comorbidities and risk factors should be used as the base for the subsequent proposal of data structure. These lists should be comprehensive of all observation reported in the literature and all information that according to the international bodies should be monitored. The identification and timely update of such lists is an important activity to ensure the comparability and integrability of the data collected from the population. In addition, consistent data collecting methods and data format should be ensured. We propose some requirements for data collection that should allow for interoperability and a subsequent smooth data analysis step. First of all, we suggest to reduce the open-ended questions and to allow only closed-ended questions. The only exceptions to this suggestion could be the definition of additional symptoms or signs not already identified as associated with COVID-19 and the input of measurements (e.g. temperature, blood pressure). Hence, our proposal can be described as follows: • implementation of a daily symptom diary that allows to the user to tick from a list consistent with the one identified by the literature and guidelines review any symptom or sign that can be easily identified by the patient (i.e. all the signs that do not need either radiologically and haematological exams or the objective evaluation performed by a clinician). We suggest to include a temperature diary that allows multiple Author name / Procedia Computer Science 00 (2019) 000-000 entries per day. It is crucial for data comparability that each temperature measurement is coupled with its associated body location. • Possibility of reporting any additional sign that is identified through clinical exams and diagnosed medical findings. Ideally, these findings should be inserted by health professionals, but this would require additional work to the many already overloaded healthcare professionals. Therefore, we think that a dedicated section for user self-reporting is appropriate, particularly if accompanied by appropriate instructions for use and precise identification of the possible hazards in case of wrong self-reporting. It is noteworthy that this solution could possibly lead to a bias of the signs and findings reporting towards the low severity findings. In fact, if this section is completed through self-reporting, the disproportion between low severity and high severity conditions is foreseeable due to the need of hospitalization and intubation in the most severe COVID-19 cases, leading to less self-reports. The bias can be mitigated through the definition of interfaces, API, or automatic systems to import in the platform the data from Electronic Health Records used by the professionals. The inverse process can be used as well, with the use of platforms by professional users and then the export of data to the healthcare facility database. • Possibility of reporting the therapies that the patient is following, with particular attention to pharmaceutical therapies. Intake methodology and intake timing should be inserted too. To ease the data insertion and the reliability of the data, pharmaceutical therapies could be inserted through research of the drug in a dedicated database. Databases of pharmaceuticals substances are clinically recognized and frequently used for similar purposes in software for hospital management. • Onboarding phase comprising the definition of the risk factors, with the chance to select at least the ones identified during the literature and guidelines review. It is extremely relevant to select the age, which is a known major risk factor. Therefore, the most appropriate way to insert the patient age is through the definition of the birth date. It could be relevant to assign to each variable the date of diagnosis/beginning of condition to monitor how this information is related to the risk for the patient. However, this could be additional information and therefore not mandatory. It is important to notice that the list of risk factors may include factors related to the severity of the COVID-19 condition and could be expressed through scores and physiological measurements. Therefore, it should be correct to add to the signs to be monitored such data. For example, it is suggested to monitor blood pressure values. Interoperability has been a major issue in healthcare information systems for many years. While some advances have been completed with the definition of some standards for interoperability, a framework able to ensure complete interoperability between different systems has yet to be issued. In the context of contact tracing in the COVID-19 pandemic, the primary reference guide for the development is the toolbox for the development of contact tracing apps in European union, issued by the European Commission [10] . The toolbox published by the European Commission tackles the development of contact tracing applications with attention to different themes and topics ranging from the privacy, which is discussed in detail in a second document [11] , to the cybersecurity aspect. The document correctly identifies three main core requirements needed to assess the interoperability: -definition of close contacts as per international guidelines; -the app should allow contact tracing with users using different applications; -the data should be exportable by backend procedures for the communication between different countries. Such requirements allow the development of an efficient contact tracing app, only if countries have foreseen to select just one app per country. In accordance with the general definition of the aim of the app and in accordance with the aims defined by the toolbox, these applications should be integrated into the contact tracing and in the monitoring systems already implemented to manage the epidemic. Therefore, the solutions are just a piece of the complex strategy of the public health institution for the control of the epidemic and the minimization of the virus spread. The easiest way to ensure this fit of the applications within the public health remedies frame, is, as said, the definition of requirements. In particular, the data transmission is one of the main challenges in this context. We suggest to use a Author name / Procedia Computer Science 00 (2019) 000-000 5 centralized method, that shall be constructed in respect of the GDPR and with privacy constraints, but without precluding to the scientific community the possibility to have extensive and comprehensive knowledge regarding the infection dynamics and the clinical manifestation. To ensure access to such information, we analysed the relevant requirements for interoperability drafted by the European toolbox and proposed a variation of such requirement that should allow easier communication and export of data. The main integrations to the toolbox that we propose here are the following: • Each app should have a central database with pseudonymized user information, correctly encrypted and protected as required by GDPR and cybersecurity standards. Pseudonymization is the reversible process that unlinks personal data and the subject [12] . The database should be accessible only for maintenance and for error resolution, with respect to users' data. After a proper anonymization process, the database should be easily exportable and aggregated to databases of other manufacturers. Therefore, to ease the inclusion, it should be possible to extract only the epidemiologically relevant data. We suggest the use of document-based databases so that each patient can be associated with different records. Consistently with the analysis completed in the previous section, we suggest a set of records that can be associated to the patient with associated mandatory fields that should be completed: Onboarding, daily symptoms diary, healthcare professional visit, end of monitoring, temperature measurement, diagnostic swab, serological examination, therapy diary. Each one of these records is associated with time and date and with the mandatory information for each field, defined as follows: Therapy diary: description of the therapy completed by the patient. Regarding the pharmaceuticals substances assumed by the patient, the description should include the intake modality and the dose. Also, each intake should be inserted with the associate date and time. The location information is beneficial for ex-post studies to evaluate data having regard for the swab policy of the region and epidemic outbreaks. For this reason, it is essential to give the users the possibility to change location information and to tag all records with the correct location. • The confirmation of a COVID case should be defined uniformly across different platforms, therefore uniforming the response to the close contact of a patient that is confirmed as infected from coronavirus. The toolbox already suggests using pseudo-random codes for the confirmation of data produced by the public health authorities. However, it could be a heavy workload for the local public health institutions, so we suggest to define the procedure that could lead to the less risky situation without the need of additional work for public health institution. For example, we propose to use the identifiers of the reports of the clinical exams or visits. All these requirements drafted here are intended to ease the management of the epidemics for researchers and public health institutions. Therefore, the use of such anonymized data should not be allowed for any other reason that is not research or epidemic monitoring. In the ideal case, the ministry of public health should access the databases and merge them and then be responsible for the distribution to the research centres. To be effective, the apps should be used indicatively by at least 60% of the population [1, 2] . Currently, the toolbox declares that the design of the applications should be based on accessibility and inclusive principles. These solutions and the additional solutions proposed by the European Commission can allow the broad use of such applications. Furthermore, the interoperability requirements can ease the population coverage, as each user can select many applications that are all interoperable. Even in the best case of inclusive and accessible applications, the voluntary basis for the download and use can limit the penetration among the population of the app use. At the meantime, the inclusion of the use of contact tracing app between the means decided by employers to ensure the safety of the workplace, (as supposed by the Politecnico di Torino [13] ) could allow broader coverage of the population, and allow an actual contact tracing especially in the riskier environments, which are the common area and the places where workers should pass the majority of their day. Even in this case, the limits of the collection modalities should be remembered: an application for mobile phones can be inclusive as far as possible, but it will not include all people that cannot have access to a mobile phone for monetary or any other reason, and will not include all people that had not the chance to have an education and therefore are not able to read and write. In any case, it is important to define the proper data flow of all data to ensure the population of the correct usage of their personal data. We suggest utilizing a data flow that focuses on the Country Government (by definition of laws or regulations) the role of the data controller as by GDPR and European Guidelines [11] . Then, we propose to use the scheme proposed by the Politecnico di Torino, where employers distribute and activate applications to employees, but they do not have access to any data. All data are received and analyzed by the General Practitioner (GP) in charge, or by the local public health institution, thus being the central data processor of all the data flow. Finally, the GP anonymize data, preserving only the predetermined and clinically relevant fields and send anonymized data to the Health Ministry, which will be responsible for the distribution of anonymized data to the authorized research centres. In addition, we suggest to include in the design of the application the evaluation of the human factors and to rely on the Zhang heuristics for medical device design, in order ensure a safe and pleasurable user interface design. In addition, the authors think that ancillary functions could improve the diffusion of the app (e.g. the implementation of an electronic self-declaration for movements justification). As said above, these applications can be identified as medical devices, and therefore their suppliers should ensure that they are safe, effective and of constant quality level. While the constant quality level of the app can be verified and validated by means of software life-cycle management procedures, safety and efficacy (that is effectiveness in ideal conditions) should be validated by other means. The currently available reference is the same mentioned above for interoperability problems, i.e., the European commission toolbox. The European Commission toolbox for contact tracing recommends the countries members of the Union to define suitable Key Performance Indicators (KPIs) to assess and monitor the app effectiveness: this is intended in terms of both technical performance, and its balance with the preservation of the fundamental rights. However, no requirements have been drafted for the whole process of contact tracing, which includes the app and the public health institution procedures. In fact, it is worthy to remind that any technological solution (e.g., a contact-tracing app) must be conceived as a component of a more holistic program where also other aspects are carefully designed and deployed, like conceiving a set of incentives to properly use the app; organizing a call center to give feedback to those notified by the app; organizing teams that can collect nasopharyngeal swabs (possibly two subsequent swabs for each case, for the relatively low accuracy of the molecular test), directly at home of those people, or at least in the immediate vicinity of where they live; organizing a set of shelters that can accommodate those people in need of strict quarantine, so that they do not infect their close relatives. Author name / Procedia Computer Science 00 (2019) 000- 000 7 For this reason, the only technical verification of the contact tracing function in terms of accuracy of distance measurements and duration, which is still of unknown accuracy and reliability, does not address the efficacy of the whole procedure that is triggered by contact tracing. In particular, the technical solutions could exhibit an alarming high rate of false positives: these must be divided in technical and clinical false positives: the former ones are due to the Bluetooth shortcomings and can occur in many real-life situations, like while one is stopped in one's car at the traffic light close to other drivers, or when two colleagues are close together but separated by a plasterboard wall. The measures that many employers and managers are taking in workplaces and places open to the public, such as Plexiglas dividers, could increase this type of false contact to a great extent. The clinical false positives regards the relatively low probability that a proximity contact, which nevertheless must still be defined both in terms of distance and duration, could actually be cause of infection. Some simulations estimate the probability of being infected by the SARS-Cov-2 virus in casual contatcs very low: being 4-foot apart as low as 6 out of 1,000; if wearing a surgical mask the figure is even much lower: 1 out of 100,000 [14] . Even assuming abuse by no-trax activism and agitators be negligible, the overall false positives rate, due to either the technical and clinical components, can jeopardise the whole contact-tracing process, by overwhelming the socio-technical structure with a number of notifications, engagements, and tests that cannot be sustained not even in the short-mid-term. Besides the false positives, the efficacy of a digital contact-tracing solution can be negatively affected by false negatives, also. In this case, the estimation is much simpler: if the app is adopted by the 30% of potential susceptible subject (which is a reasonable, if not optimistic, estimate considering the first experiences in South Korea and Singapore [15] ) the app helps detect less than 10% of contacts actually occurred, not to mention how misuse (like not always carrying the cell phone on or having it connected all the time) could make this estimate even lower. Furthermore, if one quarter of the population downloaded and properly used the application (which would still be a remarkable result provided that in a country like Italy, where the smartphone penetration is among the highest ones in the Western societies, only two thirds of the population own a smartphone, and likely fewer users own a sufficiently advanced device to be compatible with the solution), slight more than 1 contact out of 20 would be actually detected. Thus, false positives (related to technical shortcomings and epidemiological aspects) and false negatives (related to low adoption rates) are obstacles between these kinds of solutions and the achievement of their full effectiveness. Therefore, in order to obtain adequate evidence of the full effectiveness of these applications evaluated in terms of their capability to reduce infection rate, hospital admission rate and, eventually, Infection Fatality Rate (or IFR) and overall casualty rate (the only results that could justify the necessary and proportional nature of this privacythreatening solutions as a viable response to the emergency), their use should be experimented in a set of real-world local settings representative of population in terms of age, mobile phone use diffusion and percentage or workers (like a set of medium-small municipalities in different regions of the country). These experimentations would help understand the cost-effectiveness of the whole digitally-supported strategy and the impact of adoption rate on the effectiveness. In the lack of a quick and agile experimentation where the technical effectiveness and socio-technical responsiveness of the related interventions can be assessed, we assert that the minimum-impact solution should be preferred over the fully digital contact-tracing one. For instance, we would argue for the adoption of an app that does not perform automatic contact tracing, but rather allows users to record the name of the people with whom they have been in close contact (less than, say, 3 meters) for more than, e.g., 5-10 minutes, and to keep a diary of their health conditions as a memory-aid to be used or possibly shown to a professional human contact tracer at a due time and if needed. Low-fi solutions seem preferable, and more feasible, solutions, than full-fledged Bluetooth enabled contacttracing apps, unless and until the effectiveness of these latter solutions has been proved [13] . Once again, although the beneficial character of contact-tracing apps is often just taken for granted, rather the futile, or possibly harmful, character of any technological solution should be explicitly vetted, or let be assumed until proven otherwise, especially in the case of techno-surveillance or artificial intelligence solutions. In this paper we propose to integrate the requirements already drafted for the contact tracing applications, trying to leverage the opportunities of such applications and keeping attention on the major flaws that could affect these Effective Configurations of a Digital Contact Tracing App: A report to NHSX n Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (Text with EEA relevance) Council Directive 93/42/EEC of 14 June 1993 concerning medical devices Parliament decides to postpone new requirements for medical devices | News | European Parliament European Committee for Standardization, International Organization for Standardization. EN ISO 14971:2012, Medical devices: application of risk management to medical devices International Electrotechnical Commission, International Organization for Standardization International Electrotechnical Commission. IEC 62366-1:2015, Medical devices -Part 1: Application of usability engineering to medical devices Methods for Preclinical Validation of Software as a Medical Device Mobile applications to support contact tracing in the EU's fight against COVID COMMUNICATION FROM THE COMMISSION -Guidance on Apps supporting the fight against COVID 19 pandemic in relation to data protection (2020/C 124 I/01) Pseudonymisation techniques and best practices n.d Imprese aperte, lavoratori protetti -Imprese aperte, Lavoratori protetti n.d. /i_rapporti/imprese_aperte_lavoratori_protetti (accessed So You're Going Outside: A Physics-Based Coronavirus Infection Risk Estimator for Leaving the House Show evidence that apps for COVID-19 contact-tracing are secure and effective Author name / Procedia Computer Science 00 (2019) 000-000 applications. We proposed a framework for cross-platform cross-country epidemiological databases creation that could help to lead with scientific evidence public health decision in the context of this coronavirus pandemic, and we proposed the means that we consider as the most appropriate to prove adequate efficacy and ensure adequate population coverage. This proposal is based on the current literature and the current guidelines available, but this is the first step that should be continued with the confront with directly involved professionals. Therefore, the next step will be the confirmation of the proposed framework for epidemiological database creation by epidemiologists. Meantime, confirmation of the proposal concerning efficacy and population coverage should be confirmed by technicians and scientific community. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)