key: cord-0252622-uxhme6r1 authors: Kricka, Larry J.; Polevikov, Sergei; Park, Jason Y.; Fortina, Paolo; Bernardini, Sergio; Satchkov, Daniel; Kolesov, Valentin; Grishkov, Maxim title: Artificial Intelligence-Powered Search Tools and Resources in the Fight Against COVID-19 date: 2020-06-02 journal: EJIFCC DOI: nan sha: 5574afe507ddcd0ced340444e67f9c49767f327c doc_id: 252622 cord_uid: uxhme6r1 Emerging technologies are set to play an important role in our response to the COVID-19 pandemic. This paper explores three prominent initiatives: COVID-19 focused datasets (e.g., CORD-19); Artificial intelligence-powered search tools (e.g., WellAI, SciSight); and contact tracing based on mobile communication technology. We believe that increasing awareness of these tools will be important in future research into the disease, COVID-19, and the virus, SARS-CoV-2. The COVID-19 pandemic has created unprecedented challenges for the medical and clinical diagnostic community. The fight against COVID-19 is being supported by a number of databases and artificial intelligence (AI)-based initiatives aimed at assessing dissemination of the disease [1] , aiding in detection and diagnosis, minimizing the spread of the disease, and facilitating and accelerating research globally [2] [3] [4] [5] [6] [7] . Prominent among these initiatives are: the COVID-19 Open Research Dataset [8- 10] , and databases curated by the CDC [11, 12] , NLM [13] , and the WHO [14] ; AI-powered tools such as those from WellAI [15, 16] and the Allen Institute for AI (SciSight) [17] [18] [19] ; and contact tracing based on mobile communication technology [20, 21] . The CORD-19 Dataset has resulted from a partnership between the Semantic Scholar team at the Allen Institute for AI and leading research groups (Chan Zuckerberg Initiative, Georgetown University's Center for Security and Emerging Technology, Microsoft Research, the Kaggle AI platform (Google), and the National Library of Medicine-National Institutes of Health) in coordination with the White House Office of Science and Technology Policy. Publications in the collection are sourced from PubMed Central, the bioRxiv and medRxiv preprint servers, and the WHO COVID-19 Database. CORD-19 is freely available, downloadable and it is updated weekly. The collection currently contains over 128,000 publications (with over 59,000 full text as of 26 May 2020) on the disease, COVID-19, and the virus, SARS-CoV-2, and related coronaviruses. It is part of a call to action to the AI community to develop AI techniques in order to generate new insights to assist in the fight against COVID-19 [9] . This call to action has been informed by a series of tasks described in the form of a series of questions that are listed in Table 1 [22] . Table 1 COVID What is known about transmission, incubation, and environmental stability? What do we know about COVID-19 risk factors? What do we know about virus genetics, origin, and evolution? What do we know about vaccines and therapeutics? What has been published about medical care? What do we know about non-pharmaceutical interventions? What do we know about diagnostics and surveillance? What has been published about ethical and social science considerations? What has been published about information sharing and inter-sectoral collaboration? Analysis of the vast amount of COVID-19 data that has already accumulated (e.g., CORD-19 Dataset, COVID-19 cases data, Hospital Data and case statistics) [23] is a daunting challenge, however, this big data type of problem is amenable to AI-based search tools [24] such as those from WellAI and the Allen Institute for AI (SciSight). There are several advantages of AI-powered tools that exploit natural language processing (NLP) compared to a conventional search engine, e.g., unlocking buried information [25] [26] [27] , and these are summarized in Table 2 . WellAI has developed a Machine Learning (ML) search and analytics tool, based on four neural networks and incorporating the complete list of NIH medical categories [Unified Medical Language System (UMLS)] semantic types, for interrogation of the CORD-19 Dataset and this is available at https://wellai.health/covid/ [16] . It is now widely agreed that ML has significant applications in the physical and biological sciences [28] . In the WellAI COVID-19 application, a subset of ML --i.e. neural networks -is being used. Neural networks facilitate discovery of highly complex and nonlinear relationships between sets of variables without having to search for a closed form mathematical solution. Neural networks can contain tens of thousands to millions of variables, and this is the basis of their power. The complexity of relationships neural networks can uncover is difficult to fathom but is enabled by an ever-increasing computing power. Somewhat surprisingly, one of the biggest trends of the past 10 years is the increasing scientific role of neural network models of a language. At first glance it seems counterintuitive that something so qualitative and subjective as language, plays a role in learning about physical or biological sciences, which by their nature strive for precision. However, NLP is set to play a major role in scientific learning over the coming decades, because arguably the biggest 'problem' for scientists today is an ever-growing body of data, which defies any traditional tools of comprehension [29] . For example, the CORD-19 dataset already contains >128,000 articles. Digesting such a vast amount of information quickly can only be done by the NLP methods and can extend beyond capturing "known knowledge" and reveal new information and hidden connections [27] . The WellAI COVID-19 application uses NLP neural networks to 'learn' from the CORD-19 dataset in order to summarize existing knowledge. It can also be used to make discoveries in an unsupervised manner. This application is based on unsupervised learning [19, 20] , but its main goal is to enable a researcher to generate ideas for the next set of concepts that are relevant to the discovery. The UMLS concepts are used as variables in the model and these concepts provide a vast terminology. Crucially, they deal with synonymy, and by including all of the synonyms, the number of UMLS concepts increased to 4,224,512! Only 60,892 concepts are used in the WellAI COVID-19 model, grouped into 69 categories (or UMLS semantic types). Broader WellAI models are based on >25 million medical articles and use millions of concepts. These concepts are a helpful starting point. However, they had to be altered for WellAI models because they are somewhat outdated, specifically when it comes to the terminology surrounding the novel coronavirus. The altered concepts were applied to the CORD-19 dataset. This whole process was not trivial because application of concepts requires context. Different words can mean different things in different contexts. Complex ML models sensitive to the context of an article needed to be developed. A series of WellAI neural network models have been utilized to learn relationships between medical concepts. Relationships of any single concept to a General objective Neural networks summarize, generalize and predict relationships Searches for key words and phrases in an article. Cannot make conclusions about relationships. Understands synonyms and correlated concepts. For example, understands that "hypertension" is a synonym for "high blood pressure" and "elevated blood pressure". This knowledge helps build more accurate relationships between concepts. The results produced match the search words or phrases, without knowledge of synonyms and related concepts. Result aggregated and summarized? Yes. Every single concept suggestion is based on a large number of articles. No. The result is a list of articles that contain the key words or phrases. A structured list of concepts with ranked probabilities. This narrows the scope of work and results in greater efficiency. Focus on concepts of interest and exploration of relationships -not only between concepts (e.g., COVID-19 and Diagnostics Radiology), but between clusters of concepts (e.g., COVID-19 + Diagnosis, Clinical + Diagnostic Tests and Diagnostics Radiology) A list of every single occurrence (i.e., every article) of a word or a phrase. Read the articles (time consuming), summarize, and make generalizations. Starting with "COVID-19" as the preselected concept, selecting "READ ARTICLES" corresponding to "Diagnosis, Clinical" produces a list of articles in which the machine learning models have determined there is a relationship between COVID-19 and clinical diagnosis, and not just the whole list of articles that mentions both COVID-19 and clinical diagnosis. In addition, the models know there is a difference between clinical diagnosis and diagnosis. The result for search terms "COVID-19" and "clinical diagnosis", is a list of all articles that mention "COVID-19" and "clinical diagnosis" irrespective of whether there is a relationship between the two phrases mentioned in the article. For example, hypothetically speaking, the article may not be about clinical diagnosis at all, the phrase "Clinical diagnosis" may be just mentioned in the References section. At a practical level, searching the CORD-19 Dataset using the WellAI tool begins with the results of the initial analysis, based on COVID-19 and SARS Coronavirus as the preloaded concepts, and this produces a list of 69 concept categories. Each concept category has an associated list of concepts, ranked according to their significance in relation to COVID-19 based on log probability (or negative log likelihood loss) [30] of the strength of the concept relationship to COVID-19, according to the WellAI neural networks. For clinical diagnostics there are several relevant major concept categories in the list, including: "Diagnostic Procedure"; "Laboratory Procedure"; "Laboratory or Test Result". Associated with each major concept category is a list of related concepts, each linked to relevant publications ("READ ARTICLES"). The search can be refined by adding any of the concepts to the "Selected Concepts" list. A rerun of the search ("Find by selected concepts" option) produces the new lists of concepts that are most related to the new list of "Selected Concepts" (Figure. 1 ). Underlying this AI-powered tool is a network of servers that make the searching quick and seemingly effortless. Significantly, most of the questions in Table 1 could be answered by the WellAI COVID-19 tool by entering a concept (e.g., transmission mode) or looking at the relevant concept category (e.g., "Gene or Genome" for virus genetics and virus origin question). SciSight is an AI-powered visualization tool for exploring associations between concepts appearing in the CORD-19 Dataset and visualizing the emerging literature network around COVID-19 [17] [18] [19] 31] . It is available at: https://SciSight.apps. allenai.org/ [17] . SciSight is based on SciBERT, a pretrained language model, trained on a large corpus of scientific publications, to provide improved performance in natural language processing [32] . Initially, SciSight provides four different search options, namely, two scientific concepts that are important to the study of the virus, "Proteins/genes/cells" and "Diseases/chemi cals", and a "Network of Science" search and a "Faceted search". The user can explore associations between either of two preselected scientific concepts -"Proteins/genes/cells" or "Diseases/chemicals" in the CORD-19 Dataset as follows. Selection of one of the preselected concepts displays the "Try:" list below the search box, and this lists salient keywords with high relevance to SARS-CoV-2. There is also a graphical display of the network of associations between the preselected scientific concept and the top related terms mined from the Dataset. The thickness of the edges signifies that terms are co-mentioned more often in close proximity to each other in publications in the database. Clicking on an edge reveals the list of linked full text papers and hovering over a term reveals co-mentioned terms. This is illustrated in Figure 2 for the associations between the preselected concept "diseases/ chemicals" and the key words "virus infection" selected from the "Try:" list. Alternatively, one of two preselected scientific concepts can be chosen, and a search term entered. This generates and displays a list of autocompleted search suggestions. Selecting one of these suggestions again displays the network of top associations in the dataset. A "Network of science" search option allows the user to visualize research groups and their ties in the context of COVID-19. Searches can be by "Topics", "Affiliation" or "Authors" or by the seven preloaded topics in the "Try:" list. Multiple combinations of "Topics", "Affiliation", or "Authors" can be selected. Results are shown as a network of boxes that are color coded from high to low relevance. Each box shows top authors, top affiliations and top topics in a group, and the color-coded links between boxes reveal shared authors or topics. Selection of a box provides a list of publications relating to the contents of that particular box. Also, results are ranked within each topic category (e.g., "Author") by means of a shaded bar. The can be selected. Results are ranked within each topic category (e.g., Author) by means of a shaded bar and a list of relevant publications and a graphic shows the number of papers per year. Population-wide datasets are now emerging that show the response of society to COVID-19. The data includes commonly used terms in internet search engines, satellite mapping data of human activity and the emerging interactive data from digital contact tracing. Contact tracing is an essential monitoring process for combating the spread of an infectious disease [19] [20] [21] . It comprises three basic steps: 1) Contact identification; 2) Contact listing; and, 3) Contact follow-up -and it forms one part of the "Test, Trace and Quarantine" mantra. Conventionally, contact tracing is a manual process relying on finding individuals who have tested positive, and then interviewing those individuals to identify all individuals who need to be quarantined. The widespread availability of mobile communication technology (e.g., smartphones) is providing new ways of enabling contact tracing by using Bluetooth to track nearby phones, keep logs of those contacts, and to warn people about others with whom they have been in contact. In the digital age, contact tracing can be passively achieved and integrated with diagnostic testing results. On an individual level, the actions can be bi-directional. An individual can test positive and then initiate a cascade of notifications of all recent contacts. Alternatively, an individual can be notified that they were in Bluetooth proximity to an anonymous person who has tested positive. Public health authorities empowered with digital tracing can quickly identify positive contacts with a minimal workforce. In the US, Apple and Google are collaborating on tracking technology for iOS and Android smartphones [33] . Elsewhere in the world, an example of a contact tracing app is TraceTogether which has been deployed in Singapore [34, 35] . If a person is found to be positive for COVID-19, then the app uses a smartphone's Bluetooth network to notify every participating TraceTogether user that person was within 2 meters of for more than 30 minutes. In China, the Alipay Health Code on the Alipay app dictates freedom of travel based on three categories: green for unrestricted travel, yellow for a seven-day quarantine, and red for a twoweek quarantine [36] . In South Korea, people receive location-based emergency text messages from the government to inform them if they have been in the vicinity of a confirmed case of COVID-19 [37] . In Italy the app "Immuni" [38,39] combines a personal clinical diary and contact tracing. Anonymous identification codes are generated by the user's app rather than a central server in order to improve privacy. By placing identification on the individual user's device, the contact tracing information is separate from identification. The App complies with the European model outlined by the PEPP-PT (Pan European Privacy-Preserving Proximity Tracing) consortium [40] . It is delivered for free and on a voluntary basis. There has been resistance to app-based monitoring [39] , but the Italian government expect 60-70% of people will download the app. In the UK, a contact tracing app (NHS COVID-19) is currently being trialed in a limited geographical area with a population of ~140,000 [41] . This app registers duration and distance between devices and the data is fed into a centralized system where a risk algorithm estimates infection risk and triggers notifications. Other examples of pandemic data infrastructures include the Google tool, COVID Near You, to identify patterns and hot spots by location (zip-code) [ COVID-19 based on the geographic location of their smartphones [44] ; and a hashtag tracking tool for the evolution of COVID hashtags on Twitter (>628 million tweets about COVID-19) [45] . Twitter is also being used to understand the impact of COVID-19 (e.g., psychological impact) [46] . One significant concern over digital contact tracing has been ethical issues (e.g., privacy) and the consequent impact on the rate of adoption of the apps [47, 48] . Some technology developers are focused on developing tracing apps that ensure privacy protection [49] . Currently, in response to COVID-19, clinical laboratories and the IVD industry are grappling with test development, test validation, fast-track clearance (e.g., Emergency Use Authorization) [50], availability of analyzers, tests and related supplies, and testing capacity for both molecular tests for SARS-CoV-2 and tests for IgM/IgG antibodies against this virus [51, 52] . Once these issues have been resolved, the next major hurdle will be contact tracing to reduce the risk of future outbreaks. AI-powered tools will be valuable to identify trends and associations between digital contact tracing, tests and outbreaks of disease. Easily accessible AI-powered tools and databases are valuable in all types of research, but especially so, in the context of the urgent diagnostic and therapeutic challenges presented by the COVID-19 pandemic. It is hoped that the new AI-powered search tools will accelerate research and development in COVID-19 as the world strives to develop efficient and timely testing and effective therapies to combat this disastrous pandemic. Another important part of our fight against COVID-19 will be efficient digital contact tracing enabled by mobile communication technology linked with massively scaled-up testing as outlined in the recent "Roadmap to Pandemic Resilience" [53] . Pneumonia of unknown aetiology in Wuhan, China: potential for international spread via commercial air travel Artificial intelligence (AI) applications for COVID-19 pandemic Artificial intelligence and machine learning to fight COVID-19 Digital technology and COVID-19 Blockchain and artificial intelligence technology for novel coronavirus disease 2019 self-testing On the coronavirus (COVID-19) outbreak and the smart city network: Universal data sharing standards coupled with artificial intelligence (AI) to benefit urban health monitoring and management Identification of COVID-19 under quarantine Call to action to the tech community on new machine readable COVID-19 dataset The COVID-19 Open Research Dataset COVID-19 research articles downloadable database A weekly surveillance summary of U.S. COVID-19 activity Grishkov Artificial Intelligence-powered search tools and resources in the fight against COVID-19 2019-ncov/covid-data/covidview/index.html WHO. Global research on coronavirus disease WellAI develops COVID-19 research tool in response to White House's call to action. PR Newswire COVID-19 machine learning analytics for researchers SciSight is a tool for exploring the evolving network of science in the COVID-19 Open Research Dataset, from Semantic Scholar at the Allen Institute for AI Exploring the COVID-19 network of scientific research with Sci-Sight Combining faceted navigation and research group detection for COVID-19 exploratory scientific search Operational planning guidelines to support country preparedness and response. WHO COVID-19 open research dataset challenge (CORD-19) List of COVID-19 resources for machine learning and data science research How to fight COVID-19 with machine learning. Towards Data Science Natural language processing in text mining for structural modeling of protein complexes Clinical text data in machine learning: Systematic review Unsupervised word embeddings capture latent knowledge from materials science literature Machine learning algorithms: A review Recent trends in deep learning based natural language processing Negative log likelihood ratio loss for deep neural network classification Helping scientists visualize and explore COVID-19 literature with AI. Ai2. https:// medium.com/ai2-blog/coviz-helping-scientists-visualizeand-explore-covid-19-literature-with-ai-9359559368e5 A pretrained language model for scientific text Does Covid-19 contact tracing pose a privacy risk? Your questions Singapore introduces contact tracing app to slow coronavirus spread. ZDNet In coronavirus fight, China gives citizens a color code, with red flags Grishkov Artificial Intelligence-powered search tools and resources in the fight against Health System Global Italy tests contact-tracing app to speed lockdown exit NHS COVID-19: The UK's coronavirus contacts-tracing app explained Providing people with coronavirus-related data most useful to them. Berkeley Engineering TweetBinder blog. #Covid 19 -Twitter evolution. Twitter Analytics Using Twitter to understand the impact of COVID-19 Coronavirus contact-tracing apps: What are the privacy concerns? ZDNet Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing Clever cryptography could protect privacy in covid-19 contact-tracing apps medical-devices/emergency-situations-medicaldevices/emergency-use-authorizations 51. IFCC information guide on COVID-19 AACC. Coronavirus resources for labs