key: cord-0199854-yjwyoq8c authors: Lybarger, Kevin; Ostendorf, Mari; Thompson, Matthew; Yetisgen, Meliha title: Extracting COVID-19 Diagnoses and Symptoms From Clinical Text: A New Annotated Corpus and Neural Event Extraction Framework date: 2020-12-02 journal: nan DOI: nan sha: 607483437cfa93f516a06309de114e0f9592d6f9 doc_id: 199854 cord_uid: yjwyoq8c Coronavirus disease 2019 (COVID-19) is a global pandemic. Although much has been learned about the novel coronavirus since its emergence, there are many open questions related to tracking its spread, describing symptomology, predicting the severity of infection, and forecasting healthcare utilization. Free-text clinical notes contain critical information for resolving these questions. Data-driven, automatic information extraction models are needed to use this text-encoded information in large-scale studies. This work presents a new clinical corpus, referred to as the COVID-19 Annotated Clinical Text (CACT) Corpus, which comprises 1,472 notes with detailed annotations characterizing COVID-19 diagnoses, testing, and clinical presentation. We introduce a span-based event extraction model that jointly extracts all annotated phenomena, achieving high performance in identifying COVID-19 and symptom events with associated assertion values (0.83-0.97 F1 for events and 0.73-0.79 F1 for assertions). In a secondary use application, we explored the prediction of COVID-19 test results using structured patient data (e.g. vital signs and laboratory results) and automatically extracted symptom information. The automatically extracted symptoms improve prediction performance, beyond structured data alone. As of October 11, 2020, there were over 37 million confirmed COVID-19 cases globally, resulting in 1 million related deaths [1] . Surveillance efforts to track the spread of COVID-19 and estimate the true number of infections remains a challenge for policy makers, healthcare workers, and researchers, even as testing availability increases. Symptom information provides useful indicators for tracking potential infections and disease clusters [2] . Certain symptoms and underlying comorbidities have directed COVID -19 testing. However, the clinical presentation of COVID-19 varies significantly in severity and symptom profiles [3] . The most prevalent COVID-19 symptoms reported to date are fever, cough, fatigue, and dyspnea [4] , but emerging reports identify additional symptoms, including diarrhea and neurological symptoms, such as changes in taste or smell [5] [6] [7] . Certain initial symptoms may be associated with higher risk of complications; in one study, dyspnea was associated with a two-fold increased risk of acute respiratory distress syndrome [8] . However, correlations between symptoms, positive tests, and rapid clinical deterioration are not well understood in ambulatory care and emergency department settings. Routinely collected information in the Electronic Health Record (EHR) can provide crucial COVID-19 testing, diagnosis, and symptom data needed to address these knowledge gaps. Laboratory results, vital signs, and other structured data results can easily be queried and analyzed at scale; however, more detailed and nuanced descriptions of COVID-19 diagnoses, exposure history, symptoms, and clinical decision-making are typically only documented in the clinical narrative. To leverage this textual information in large-scale studies, the salient COVID-19 and symptom information must be automatically extracted. This work presents a new corpus of clinical text annotated for COVID-19, referred to as the COVID-19 Annotated Clinical Text (CACT) Corpus. CACT consists of 1,472 notes from the University of Washington (UW) clinical repository with detailed event-based annotations for COVID-19 diagnosis, testing, and symptoms. To the best of our knowledge, CACT is the first clinical data set with COVID-19 annotations, and it includes 29.9K distinct events. We present the first information extraction results on CACT using an end-to-end neural event extraction model, establishing a strong baseline for identifying COVID-19 and symptom events. We explore the prediction of COVID-19 test results (positive or negative) using structured EHR data and automatically extracted symptoms and find that the automatically extracted symptoms improve prediction performance. Given the recent onset of COVID-19, there are limited COVID-19 corpora for natural language processing (NLP) experimentation. Corpora of scientific papers related to COVID-19 are available [9, 10] , and automatic labels for biomedical entity types are available for some of these research papers [11] . However, we are unaware of corpora of clinical text with supervised COVID-19 annotations. Multiple clinical corpora are annotated for symptoms. As examples, South et al. [12] annotated symptoms and other medical concepts with negation (present/not present), temporality, and other attributes. Koeling et al. [13] annotated a pre-defined set of symptoms related to ovarian cancer. For the i2b2/VA challenge, Uzuner et al. [14] annotated medical concepts, including symptoms, with assertion values and relations. While some of these corpora may include symptom annotations relevant to COVID-19 (e.g. "cough" or "fever"), the distribution and characterization of symptoms in these corpora may not be consistent with COVID-19 presentation. To fill the gap in clinical COVID-19 annotations, including symptoms, we introduce CACT to provide a relatively large corpus with COVID-19 diagnosis, testing, and symptom annotations. There is a significant body of information extraction (IE) work related to coreference resolution, relation extraction, and event extraction tasks. In these tasks, spans of interest are identified, and linkages between spans are predicted. Many contemporary IE systems use end-to-end multi-layer neural models that encode an input word sequence using recurrent or transformer layers, classify spans (entities, arguments, etc.), and predict the relationship between spans (coreference, relation, role, etc.) [15] [16] [17] [18] [19] [20] . Of most relevance to our work is a series of developments starting with Lee et al. [21] , which introduces a span-based coreference resolution model that enumerates all spans in a word sequence, predicts entities using a feed-forward neural network (FFNN) operating on span representations, and resolves coreferences using a FFNN operating on entity span-pairs. Luan et al. [22] adapts this framework to entity and relation extraction, with a specific focus on scientific literature. Luan et al. [23] extends the method to take advantage both of co-reference and relation links in a graph-based approach to jointly predict entity spans, co-reference, and relations. By updating span representations in multi-sentence co-reference chains, the graph-based approach achieved state-of-the-art on several IE tasks representing a range of different genres. Wadden et al. [24] expands on Luan et al. [23] 's approach, adapting it to event extraction tasks. We build on Luan et al. [22] and Wadden et al. [24] 's work, augmenting the modeling framework to fit the CACT annotation scheme. In CACT, event arguments are generally close to the associated trigger, and inter-sentence events linked by co-reference are infrequent, so the graph-based extension, which adds complexity, is unlikely to benefit our extraction task. Many recent NLP systems use pre-trained language models (LMs), such as ELMo, BERT, and XLNet, that leverage unannotated text [25] [26] [27] . A variety of strategies for incorporating the LM output are used in IE systems, including using the contextualized word embedding sequence: as the input to a Conditional Random Field entity extraction layer [28] , as the basis for building span representations [23, 24] , or by adding an entity-aware attention mechanism and pooled output states to a fully transformer-based model [29] . There are many domain-specific LM variants. Here, we use Alsentzer et al. [30] 's Bio+Clinical BERT, which is trained on PubMed papers and MIMIC-III [31] clinical notes, for building span representations. There are many pre-print and published works exploring the prediction of COVID-19 outcomes, including COVID-19 infection, hospitalization, acute respiratory distress syndrome, need for intensive care unit (ICU), need for a ventilator, and mortality [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] . These COVID-19 outcomes are typically predicted using existing structured data within the EHR, including demographics, diagnosis codes, vitals, and lab results, although Izquierdo et al. [37] incorporates automatically extracted information using the existing EHRead tool. Our literature review identified 24 laboratory, vital sign, and demographic structured fields that are predictive of COVID-19, including: age, alanine aminotransferase, albumin, alkaline phosphatase (ALP), aspartate aminotransferase (AST), basophils, calcium, C-reactive protein (CRP), D-dimer, eosinophils, gamma-glutamyl transferase (GGT), gender, heart rate, lactate dehydrogenase (LDH), lymphocytes, monocytes, neutrophils, oxygen saturation, platelets, prothrombin time (PT), respiratory rate, temperature, troponin, and white blood cell (WBC) count. Table 3 in the Appendix details of the specific publications associated with each of these fields. While there are some frequently cited fields (e.g., age, AST, CRP, LDH, lymphyocytes, neutrophils, and temperature), there does not appear to be a consensus across the literature regarding the most prominent predictors of COVID-19 infection. These 24 predictive fields informed the development of our COVID-19 prediction work in Section 5. Prediction architectures includes logistic regression, Support Vector Machines (SVM), decision trees, random forest, K-nearest neighbors, Naïve Bayes, and multilayer perceptron [37, 38, [42] [43] [44] . This work used inpatient and outpatient clinical notes from the UW clinical repository. COVID-19-related notes were identified by searching for variations of the terms coronavirus, covid, sars-cov, and sars-2 in notes authored between February 20-March 31, 2020, resulting in a pool of 92K notes. Samples were randomly selected for annotation from a subset of 53K notes that include at least five sentences and correspond to the note types: telephone encounters, outpatient progress, emergency department, inpatient nursing, intensive care unit, and general inpatient medicine. Multiple note types were used to improve the extraction model generalizability. Early in the outbreak, the UW EHR did not include COVID-19 specific structured data; however, structured fields indicating COVID-19 test types and results were added as testing expanded. We used these structured fields to assign a COVID-19 Test label describing COVID-19 polymerase chain reaction (PCR) testing to each note based on patient test status within the UW system (no data external to UW was used): • none: patient testing information is not available • positive: patient will have at least one future positive test • negative: patient will only have future negative tests More nuanced descriptions of COVID-19 testing (e.g. conditional or unordered tests) or diagnoses (e.g. possible infection or exposure) are not available as structured data. For the 53K note subset, the COVID-19 Test label distribution is 90.8% none, 7.9% negative, and 1.3% positive. 1 Given the sparsity of positive and negative notes, CACT is intentionally biased to increase the prevalence of these labels. To ensure adequate positive training samples, the CACT training partition includes 46% none, 5% negative, and 49% positive notes. Ideally, the test set would be representative of the true distribution; however, the expected number of positive labels with random selection is insufficient to evaluate extraction performance. Consequently, the CACT test partition was biased to include 50% none, 46% negative, and 4% positive notes. Notes were randomly selected in equal proportions from the six note types. CACT includes 1,472 annotated notes, including 1,028 train and 444 test notes. Event type, e Argument type, a Argument subtypes, L l Span examples COVID Trigger * -"COVID," "COVID-19" Test Status † {positive, negative, pending, conditional, not ordered, not patient, indeterminate } "tested positive" Assertion † {present, absent, possible, hypothetical, not patient} "positive," "low suspicion" Symptom Trigger * -"cough," "shortness of breath" Assertion * {present, absent, possible, conditional, hypothetical, not patient} "admits," "denies" Change {no change, worsened, improved, resolved} "improved," "continues" Severity {mild, moderate, severe} "mild," "required ventilation" Anatomy -"chest wall," "lower back" Characteristics -"wet productive," "diffuse" Duration -"for two days," "1 week" Frequency -"occasional," "chronic" We created detailed annotation guidelines for two event types, COVID and Symptom, which are summarized in Table 1 . COVID and Symptom are annotated as events, where each event includes a trigger that identifies and anchors the event and arguments that characterize the event. For COVID events, the trigger is generally an explicit COVID-19 reference, like "COVID-19" or "coronavirus." Test Status characterizes implicit and explicit references to COVID-19 testing, and Assertion captures diagnoses and hypothetical references to COVID-19. Symptom events capture subjective, often patient reported, indications of disorders and diseases (e.g "cough"). For Symptom events, the trigger identifies the specific symptom, for example "wheezing" or "fever," which is characterized through Assertion, Change, Severity, Anatomy, Characteristics, Duration, and Frequency arguments. Symptoms were annotated for all conditions/diseases, not just COVID- 19 . The annotation scheme includes two types of arguments: labeled arguments and span-only arguments. Labeled arguments (e.g. Assertion) include an argument span, type, and subtype (e.g. present). Span-only arguments, like Characteristics, include an argument span and type, without a subtype label. Notes were annotated using the BRAT annotation tool [45] . Figure 1 presents BRAT annotation examples. Annotation and extraction is scored as a slot filling task, focusing on information most relevant to secondary use applications. Figure 2 presents the same sentence annotated by two annotators, along with the populated slots for the Symptom event. Both annotations include the same trigger and Frequency spans ("cough" and "intermittent", respectively). The Assertion spans differ ("presenting with" vs. "presenting"), but the assigned subtypes (present) are the same, so the annotations are equivalent for purposes of populating a database. Annotator agreement and extraction performance are assessed using scoring criteria that reflects this slot filling interpretation of the labeling task. ⇓ SSx(trigger="cough", Assertion=present, Frequency="intermittent") The Symptom trigger span identifies the specific symptom. For COVID, the trigger anchors the event, although the span text is not salient to downstream applications. For labeled arguments, the subtype label captures the most salient information, and the identified span is less informative. For span-only arguments, the spans are not easily mapped to a fixed label set, so the selected span contains the salient information. Performance is evaluated using precision, recall, and F1. Trigger: Triggers, T i , are represented by a pair (event type, e i ; token indices, x i ). Trigger equivalence is defined as Arguments: Events are aligned based on trigger equivalence. The arguments of events with equivalent triggers are compared using different criteria for labeled arguments and span-only arguments. Labeled arguments, L i , are represented as a triple (argument type, a i ; token indices, x i ; subtype, l i ). For labeled arguments, the argument type, a, and subtype, l, capture the salient information and equivalence is defined as Span-only arguments, S i , are represented as a pair (argument type, a i ; token indices, x i ). Span-only arguments with equivalent triggers and argument types, , are compared at the token-level (rather than the span-level) to allow partial matches. Partial match scoring is used as partial matches can still contain useful information. CACT includes 1,472 notes with a 70%/30% train/test split and 29.9K events annotated (5.4K COVID and 24.4K Symptom). Figure 3 contains a summary of the COVID annotation statistics for the train/test subsets. By design, the training and test sets include high rates of COVID-19 infection (present subtype for Assertion and positive subtype for Test Status), with higher rates in the training set. CACT includes high rates of Assertion hypothetical and possible subtypes. The hypothetical subtype applies to sentences like, "She is mildly concerned about the coronavirus" and "She cancelled nexplanon replacement due to COVID-19." The possible subtype applies to sentences like, "risk of Covid exposure" and "Concern for respiratory illness (including COVID-19 and influenza)." Test Status pending is also frequent. There is some variability in the endpoints of the annotated COVID trigger spans (e.g. "COVID" vs. "COVID test"); however 98% of the COVID trigger spans in the training set start with the tokens "COVID," "COVID19," or "coronavirus." Since the COVID trigger span is only used to anchor and disambiguate events, the COVID trigger spans were truncated to the first token of the annotated span in all experimentation and results. The training set includes 1,756 distinct uncased Symptom trigger spans, 1,425 of which occur fewer than five times. Figure 4 presents the frequency of the 20 most common Symptom trigger spans in the training set by Assertion subtypes present, absent, and other (possible, conditional, hypothetical, or not patient). The extracted symptoms in Figure 4 were manually normalized to aggregate different extracted spans with similar meanings (e.g. "sob" and "short of breath" → "shortness of breath"; "febrile" and "fevers" → "fever"). These 20 symptoms account for 62% of the training set Symptom events. There is ambiguity in delineating between some symptoms and other clinical phenomena (e.g. exam findings and medical problems), which introduces some annotation noise. All annotation was performed by four UW medical students. After the first round of annotation, annotator disagreements were carefully reviewed, the annotation guidelines were updated, and annotators received additional training. Additionally, potential COVID triggers were pre-annotated using pattern matching ("COVID," "COVID-19," "coronavirus," etc.), to improve the recall of COVID annotations. Pre-annotated COVID triggers were modified as needed by the annotators, including removing, shifting, and adding trigger spans. Figure 5 presents the annotator agreement for the second round of annotation, which included 96 doubly annotated notes. For labeled arguments, F1 scores are micro-average across subtypes. Event extraction tasks, like ACE05 [46] , typically require prediction of the following event phenomena: • trigger span identification • trigger type (event type) classification • argument span identification • argument type/role classification The CACT annotation scheme differs from this configuration in that labeled arguments require the argument type (e.g. Assertion) and the subtype (e.g. present, absent, etc.) to be predicted. Resolving the argument subtypes require a classifier with additional predictive capacity. We implement a span-based, end-to-end, multi-layer event extraction model that jointly predicts all event phenomena, including the trigger span, event type, and argument spans, types, and subtypes. where start(s i ) and end(s i ) denote the start and end token indices of span s i . Span representation c for span i is calculated as the attention-weighted sum of the bi-LSTM hidden state as where φ c (s i ) yields a vector of label scores of size |L c |, FFNN s,c is a non-linear projection from size 2v h to v s , and w s,c has size |L c | × v s . The trigger prediction label set is L trigger = {null, COVID, Symptom}. Separate classifiers are used for each labeled argument (Assertion, Change, Severity, and Test Status) with label set, L c = {null∪L l }, where L l is defined in Table 1 where ψ d (s j , s k ) is a vector of size 2, FFNN r,d is a non-linear projection from size 4v h to v r , and w r,d has size 2 × v r . Span pruning: To limit the time and space complexity of the pairwise argument role predictions, only the top-K spans for each span classifier, c, are considered during argument role prediction. The span score is calculated as the maximum label score in φ c , excluding the null label score. The model configuration was selected using 3-fold cross validation (CV) on the training set. Table 4 in the Appendix summarizes the selected configuration. Training loss was calculated by summing the cross entropy across all span and argument role classifiers. Models were implemented using the Python PyTorch module [47] . During initial experimentation, Symptom Assertion extraction performance was high for the absent subtype and lower for present. The higher absent performance is primarily associated with the consistent presence of negation cues, like "denies" or "no." While there are affirming cues, like "reports" or "has," the present subtype is often implied by a lack of negation cues. For example, an entire sentence could be "Short of breath." To provide the Symptom Assertion span classifier with a more consistent span representation, we substituted the Symptom trigger token indices for the Symptom Assertion token indices in each event and found that performance improved. We extended this trigger token indices substitution approach to all labeled arguments (Assertion, Change, Severity, and Test Status) and found performance improved. By substituting the trigger indices for the labeled argument indices, trigger and labeled argument prediction is roughly treated as a multi-label classification problem, although the model is not constrained to require trigger and labeled argument predictions to be associated with the same spans. As previously discussed, the scoring routine does not consider the span indices of labeled arguments. annotator agreement, and Frequency extraction performance is lower than annotation agreement. Change, Severity, and Characteristics extraction performance is low, again likely related to low annotator agreement for these cases. This section explores the prediction of COVID-19 test results, utilizing structured EHR data and automatically extracted symptom information from clinical notes. An existing clinical data set from the UW from January 2020 through May 2020 was used to explore the prediction of COVID-19 test results and identify the most prominent predictors of COVID-19. The data set represents 230K patients, including 28K patients with at least one COVID-19 PCR test result. The data set includes telephone encounters, outpatient progress notes, and emergency department (ED) notes, as well as structured data (demographics, vitals, laboratory results, etc.). overlap with the data set used in Section 4.1 but is treated as a separate data set in this COVID-19 prediction task. The notes in the CACT training set are less than 1% of the notes used in this secondary use application. Features: Symptom information was automatically extracted from the notes using the Span-based Event Extractor trained on CACT. 3 The extracted symptoms were manually normalized to aggregate different extracted spans with similar meanings. Each extracted symptom with an Assertion value of "present" was assigned a feature value of 1. The 24 identified predictors of COVID-19 from existing literature (see Section 2) were mapped to 32 distinct fields within the UW EHR and used in experimentation. Identified fields are listed in Table 5 of the Appendix. For the coded data (e.g. structured fields like "basophils"), experimentation was limited to this subset of literature-supported COVID-19 predictors, given the limited number of positive COVID-19 tests in this data set. Within the 7-day history, features may occur multiple times (e.g. multiple temperature measurements). For each feature, the series of values was represented as the minimum or maximum of the values depending on the specific feature. For example, temperature was represented as the maximum of the measurements to detect any fever, and oxygen saturation was represented as the minimum of the values to capture any low oxygenation events. Table 5 in the Appendix includes the aggregating function, f , used for each structured field. Where symptom features were missing, the feature value was set to 0. For features from the structured EHR data, which are predominantly numerical, missing features were assigned the mean feature value in the set used to train the COVID-19 prediction model. Model: COVID-19 was predicted using the Random Forest framework, because it facilitates nonlinear modeling with interdependent features and interpretability analyses (Scikit-learn Python implementation used [48] ). Alternative prediction algorithms considered include Logistic Regression, SVM, and FFNN. Logistic Regression assumes feature independence and linearity, which is not valid for this task. For example, the feature set includes both the symptom "fever" and temperature measurements (e.g. "38.6 • C" Separate models were trained and evaluated for each note type (ED, progress, and telephone) and feature set (structured, notes, and all). The selected Random Forest hyperparameters are summarized in Table 6 in the Appendix. experimentation with progress and telephone notes, due to the higher prevalence of vital sign measurements and laboratory testing in proximity to ED visits. In ED note experimentation, over 99% of samples include vital signs and 72% include blood work. In progress and telephone note experimentation, 23 Given the relatively small sample size and low proportion of positive COVID-19 tests, the SHAP impact values presented in Figure 8 were aggregated across repeated hold-out runs. Figure 9 presents the averaged SHAP values for each repeated hold-out run for the eight most predictive features for the all feature set. For symptoms; progress -fever, myalgia, respiratory symptoms, cough, and ill; and telephone -fever, cough, myalgia, fatigue, and sore throat. The differences in symptom importance by note type reflects differences in documentation in the clinical settings (e.g., emergency department, outpatient, and tele-visit). We present CACT, a novel corpus with detailed annotations for COVID-19 diagnoses, testing, and symptoms. CACT includes 1,472 unique notes across six note types with more than 500 notes from patients with future COVID-19 testing. We implement the Span-based Event Extractor, which jointly extracts all annotated phenomena, including argument types and subtypes. The Span-based Event Extractor achieves near-human performance high performance in the extraction of COVID triggers (0.97 F1) and Symptom triggers (0.83 F1) and Assertions (0.79 F1). In a COVID-19 prediction task, automatically extracted symptom information improved the prediction of COVID-19 test results (with significance) beyond just using structured data, and top predictive symptoms include fever, cough, and myalgia. The secondary use application is limited by the size and scope of the available data. In future work, the extractor will be applied to a much larger set of clinical ambulatory care and emergency department notes from UW. The extracted symptom information will also be combined with routinely coded data (e.g. diagnosis and procedure codes, demographics) and automatically extracted data (e.g. social determinants of health [51] ). Using these data, we will develop models for predicting risk of COVID-19 infection amongst individuals who are tested. These models could better inform clinical indications for prioritizing testing with constrained test availability. Additionally, the presence or absence of certain symptoms can be used to inform clinical care decisions with greater precision. This future work may also identify combinations of symptoms (including their presence, absence, severity, sequence of appearance, duration, etc.) associated with clinical outcomes and health service utilization, such as deteriorating clinical course and need for repeat consultation or hospital admission. The use of detailed symptom information will be highly valuable in informing these models, but potentially only with the level of nuance that our extraction models provide. With the COVID-19 pandemic continuing for the foreseeable future, accelerating the research outlined in this paper will inform key clinical and health service decision making. World Health Organization, Coronavirus disease (COVID-19) weekly epidemiological update and weekly operational update A framework for identifying regional outbreak and spread of COVID-19 from one-minute population-wide surveys Characteristics of and important lessons from the coronavirus disease 2019 (COVID-19) outbreak in China: Summary of a report of 72314 cases from the Chinese Center for Disease Control and Prevention Prevalence of comorbidities in the novel Wuhan coronavirus (COVID-19) infection: a systematic review and metaanalysis Clinical features of COVID-19 COVID-19 transmission within a family cluster by presymptomatic carriers in China Presymptomatic transmission of SARS-CoV-2-Singapore Risk factors associated with acute respiratory distress syndrome and death in patients with coronavirus disease CORD-19: The COVID-19 open research dataset World Health Organization, Global literature on coronavirus disease Comprehensive named entity recognition on CORD-19 with distant or weak supervision Developing a manually annotated clinical document corpus to identify phenotypic information for inflammatory bowel disease Annotating a corpus of clinical text records for learning to recognize symptoms automatically i2b2/VA challenge on concepts, assertions, and relations in clinical text Joint entity and relation extraction based on a hybrid neural network Event detection with neural networks: A rigorous empirical evaluation Extracting entities with attributes in clinical text via joint deep learning A deep neural network model for joint entity and relation extraction Extracting medications and associated adverse drug events using a natural language processing system combining knowledge base and deep learning Adverse drug events and medication relation extraction in electronic health records with ensemble deep learning methods End-to-end neural coreference resolution Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction A general framework for information extraction using dynamic span graphs Entity, relation, and event extraction with contextualized span representations Deep contextualized word representations BERT: Pre-training of deep bidirectional transformers for language understanding Generalized autoregressive pretraining for language understanding BERT-based multi-head selection for joint entity-relation extraction Extracting multiple-relations in one-pass with pre-trained transformers Clinical Natural Language Processing Workshop MIMIC-III, a freely accessible critical care database Predictors of mortality in hospitalized COVID-19 patients: A systematic review and meta-analysis Predictors of adverse prognosis in COVID-19: A systematic review and meta-analysis Predictive symptoms and comorbidities for severe COVID-19 and intensive care unit admission: a systematic review and meta-analysis A novel simple scoring model for predicting severity of patients with sars-cov-2 infection Risk factors for adverse clinical outcomes with COVID-19 in China: a multicenter, retrospective, observational study Clinical characteristics and prognostic factors for ICU admission of patients with COVID-19 using machine learning and natural language processing Prediction models for diagnosis and prognosis of COVID-19: systematic review and critical appraisal Epidemiology and clinical features of COVID-19: A review of current literature Risk factors of severe disease and efficacy of treatment in patients infected with COVID-19: A systematic review, meta-analysis and meta-regression analysis Detection of COVID-19 infection from routine blood exams with machine learning: A feasibility study Artificial intelligence-enabled rapid diagnosis of patients with COVID-19 Personalized predictive models for symptomatic COVID-19 patients using basic preconditions: Hospitalizations, mortality, and the need for an ICU or ventilator BRAT: a web-based tool for NLPassisted text annotation ACE 2005 multilingual training corpus ldc2006t06 Pytorch: An imperative style, high-performance deep learning library Scikit-learn: machine learning in Python From local explanations to global understanding with explainable AI for trees Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap Annotating social determinants of health using active learning, and characterizing determinants using neural event extraction Structured fields from UW EHR used to predict COVID-19 infection. f indicates the with function used to aggregate multiple measurements/values. Fields that measure the same phenomena and were treated as a single feature This work has been funded by Gordon and Betty Moore Foundation and by the National Center For Advancing Translational Sciences of the National Institutes of Health under Award Number UL1 TR002319.We want to acknowledge Elizabeth Chang, Kylie Kerker, Jolie Shen, and Erica Qiao for their contributions to the gold standard annotations and Nicholas Dobbins for data management and curation.Research and results reported in this publication was partially facilitated by the generous contribution of computational resources from the University of Washington Department of Radiology. age [38, 39, 42, 43] alanine aminotransferase (ALT) [40, 42] albumin [41] alkaline phosphatase (ALP) [ [38, 42, 43] Parameter Fields in UW EHR f age "AgeIn2020" max ALT "ALT (GPT)" max albumin "Albumin" min ALP "Alkaline Phosphatase (Total)" max AST "AST (GOT)" max basophils "Basophils" and "% Basophils" min calcium "Calcium" min CRP "CRP, high sensitivity" max D-dimer "D Dimer Quant" max eosinophils "Eosinophils" and "% Eosinophils" min GGT "Gamma Glutamyl Transferase" max gender "Gender" last heart rate "Heart Rate" and "HR" max LDH "Lactate Dehydrogenase" max lymphocytes "Lymphocytes" and "% Lymphocytes" min monocyptes "Monocytes" max neutrophils "Neutrophils" and "% Neutrophils" max oxygen saturation "Oxygen Saturation" and "O2 Saturation (%)" min platelets "Platelet Count" min PT "Prothrombin Time Patient" and "Prothrombin INR" max respiratory rate "Respiratory Rate" max temperature "Temperature -C" and "Temperature (C)" max troponin "Troponin I" and "Troponin I Interpretation" max WBC count "WBC" min