key: cord-0218138-2w06dpam authors: Lybarger, Kevin; Mabrey, Linzee; Thau, Matthew; Bhatraju, Pavan K.; Wurfel, Mark; Yetisgen, Meliha title: Identifying ARDS using the Hierarchical Attention Network with Sentence Objectives Framework date: 2021-03-10 journal: nan DOI: nan sha: 3fb4c22c05b3d4d2a27fb7694cd0dc9822a54275 doc_id: 218138 cord_uid: 2w06dpam Acute respiratory distress syndrome (ARDS) is a life-threatening condition that is often undiagnosed or diagnosed late. ARDS is especially prominent in those infected with COVID-19. We explore the automatic identification of ARDS indicators and confounding factors in free-text chest radiograph reports. We present a new annotated corpus of chest radiograph reports and introduce the Hierarchical Attention Network with Sentence Objectives (HANSO) text classification framework. HANSO utilizes fine-grained annotations to improve document classification performance. HANSO can extract ARDS-related information with high performance by leveraging relation annotations, even if the annotated spans are noisy. Using annotated chest radiograph images as a gold standard, HANSO identifies bilateral infiltrates, an indicator of ARDS, in chest radiograph reports with performance (0.87 F1) comparable to human annotations (0.84 F1). This algorithm could facilitate more efficient and expeditious identification of ARDS by clinicians and researchers and contribute to the development of new therapies to improve patient care. Coronavirus disease 2019 is caused by infection with the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) and is associated with high mortality. 1 A high-risk complication of COVID-19 infection is the development of the acute respiratory distress syndrome (ARDS), which is characterized by severe inflammatory lung injury. Other common hospital diagnoses, such as sepsis, pneumonia, and trauma, are also associated with the development of ARDS. Interventions to prevent injury from invasive mechanical ventilation and differences in clinical management have improved clinical outcomes in patients with ARDS; 2-4 however, ARDS is commonly under recognized by clinicians. In an epidemiologic study involving 500 intensive care units across 50 countries, over 40% of all ARDS cases were not recognized by clinicians, and the diagnosis of over 30% of ARDS cases was delayed. 5 In another study, investigators demonstrated that delays in initiating evidence-based treatments was associated with increased hospital mortality in patients with ARDS. 6 The identification of ARDS requires the assessment of lung injury patterns in chest imaging. A primary contributor to undiagnosed ARDS is the challenge of incorporating radiologist-derived chest imaging findings into diagnostic algorithms for ARDS. Per the "Berlin Definition," ARDS diagnosis requires: 7 • timing: condition occurs within one week of a known clinical insult or new/worsening respiratory symptoms • chest imaging: bilateral opacities that are not fully explained by effusions, lobar or lung collapse, nodules, or masses • non-cardiogenic edema: alveolar infiltrates are not fully explained by cardiac failure or hydrostatic edema • oxygenation: oxygenation measurements meet defined thresholds (mild, moderate, and severe) The oxygenation component requires decreased oxygenation and is generally documented in structured data in the electronic health record (EHR). The non-cardiogenic edema component requires an absence of hydrostatic edema, and the associated risk factors may be captured in structured admit diagnosis codes or the clinical narrative. The timing component requires a proximal risk factor for respiratory failure, for example the presence of COVID-19, and may be documented through lab results or diagnosis codes. The information needed to assess the chest imaging requirements is typically represented in chest radiographs (x-rays) and computed tomography images, as well as the associated freetext reports describing radiologists' findings and interpretation. Data-driven computer vision approaches for directly analyzing the chest radiographs images are still in development and are computationally expensive. This work explores the automatic identification of the chest imaging requirements for ARDS in free-text chest radiograph reports. We use natural language processing (NLP) information extraction techniques to identify descriptions of opacities (increased radiodensity), classify the opacities as parenchymal (indicative of alveolar edema/infiltrates) or extraparenchymal (outside the lungs or not indicative of alveolar edema/infiltrates), resolve sidedness (unilateral or bilateral), capture size information (small, moderate, or large), and indicate negation ("not present"). We developed detailed annotation guidelines that include summary document-level annotations and detailed relation annotations that characterize opacities. Using this novel annotation scheme, we created a new annotated corpus of 420 chest radiograph reports, referred to as the Pulmonologist Annotated Corpus (PAC). This work presents the Hierarchical Attention Network with Sentence Objectives (HANSO) framework, which is an end-to-end neural model that utilizes both the document-level and relation annotations. We introduce an approach for leveraging entity and relation annotations with noisy spans to improve document classification performance within the HANSO framework. We compare the performance of HANSO against two gold standards: manually annotated chest radiograph reports and manually annotated chest radiograph images. HANSO achieves very high performance in identifying the presence of bilateral infiltrates, a key indicator of ARDS, relative to both the annotated reports (0.87 F1) and annotated images (0.87 F1). HANSO also identifies factors that are less consistent for ARDS, specifically extraparenchymal opacities, with high performance (0.80 F1). Many works explore NLP information extraction techniques with radiology reports. 8 Within this body of radiology research, several works explore the identification of pulmonary conditions in chest radiograph reports. Most prior pulmonary information extraction work implements discrete document classification models where labels are assigned at the document-level, without utilizing word-level annotations or predictions. Bejan, et al. identify pneumonia in chest radiograph reports using Support Vector Machines (SVM) with word n-grams, medical concepts, and other features. 9 Yetisgen, et al. automatically identify acute lung injury in chest radiograph reports using Maximum Entropy (MaxEnt) models that utilize word n-grams and assertion predictions (present vs. absent). 10 Afshar et al. and Mayampurath et al. predict ARDS in chest radiograph reports using word n-grams and medical concept features using discrete modeling approaches, including decision trees, k-nearest neighbors, naive bayes, logistic regression, and SVM. 11, 12 Mayampurath et al. achieves the best performance in predicting ARDS using unigram term frequency-inverse document frequency (TF-IDF) features with SVM, 12 which we implement here as a baseline. Some recent NLP work with chest radiograph reports utilizes continuous, neural modeling approaches. Datta et al. annotate approximately 2,000 chest radiograph reports using a detailed relation-based annotation scheme that characterizes radiology phenomena across multiple dimensions. Datta implements neural entity and relation extraction models, including a baseline model consisting of stacked bidirectional long short-term memory (bi-LSTM) and conditional random field layers, as well as transformer-based approaches using BERT and XLNet. 13 Apostolova et al. investigate ARDS using both clinical text and structured EHR data from the MIMIC-III database, exploring ARDS likelihood, mortality, and risk factors using learned vector patient representations. 14 Apostolova's patient vectors incorporate information from clinical notes, diagnosis codes, and other structured data using Convolutional Neural Networks and Gradient Boosting Machine. This work is differentiated from prior ARDS-related information extraction work in multiple ways. This work presents a new detailed annotation scheme that identifies indicators and confounding factors for ARDS, including documentlevel summary labels and detailed relation annotations describing the support or evidence for the document-level labels. It introduces a new end-to-end, neural multitask model that predicts the document-level labels and utilizes the detailed relation annotations to augment learning. Additionally, this work presents an approach for leveraging noisy entity and relation annotations. This work utilized two existing clinical data sets from the University of Washington Harborview and Montlake campuses. The first data set, Data set A, includes 831 chest radiograph reports for 173 patients from February-September 2020. Data set A includes patients that were being evaluated under suspicion for COVID-19 and admitted to a medical or trauma intensive care unit (ICU). Inclusion criteria were: (1) ICU admission; (2) suspicion for COVID-19; (3) invasive mechanical ventilation; (4) presence of at least one partial pressure of arterial blood oxygen-to-fraction of inspired oxygen ratio (Pa02/FI02) less than 300 mmHg. For Data set A, an expert chest radiologist annotated 154 radiograph images using the Berlin criteria to identify patients with diffuse bilateral pulmonary opacities. The second data set, Data set B, includes 1,279 radiograph reports for 788 patients from March-November, 2020. Data set B includes all patients hospitalized with COVID-19, resulting in a broader patient population than Data set A with varying degrees of severity of illness from COVID-19 infection. Annotation Scheme: We developed a detailed annotation scheme that facilitates the identification of lung infiltrates and extraparenchymal opacities. Table 1 summarizes the annotated phenomena, and Figure 1 presents annotation examples from the BRAT annotation tool. 15 Each report was annotated with two categories of labels: document labels and relational labels. The document labels summarize the annotators' overall assessment of each chest radiograph report with two multiclass labels: infiltrates -consistent with ARDS and extraparenchymal -less consistent with ARDS. The document classes are: none -insufficient information for assessment or absence explicitly stated; present -condition present but sidedness unknown; unilateral -explicitly one lung; and bilateral -explicitly both lungs. The string, " INFILTRATES EXTRAPARENCHYMAL ," was appended to each report to facilitate the assignment of these document labels. The relation annotations are evidence for the document labels and include annotated spans and links between spans. Although the annotated spans are not necessarily noun phrases, we refer to the spans as "entities" here. The entity types include region, side, size, and negation. The annotation of region, side, and size includes an identified span and the assignment of a subtype label that normalizes the span contents, mapping the phrase to a clinically significant label. For example, all region entities include a subtype label of parenchymal or extraparenchymal. The negation entity only includes an annotated span without a subtype label, although the type label conveys the span meaning (i.e. "absent"). The relation annotations indicate whether a side, size, or negation entity are an attribute of a region entity. All attribute (attr) relation annotations are unidirectional, where the first entity in all relations has type region). The annotation scheme provides the information necessary to categorize each chest radiograph report with respect to the radiologic criteria for ARDS; namely the presence of bilateral opacities that are not fully explained by effusions, lobar or lung collapse, nodules, or masses. The presence of opacities was qualified as infiltrates (indicative of alveolar process) and/or extraparenchymal (indicative of effusions, collapse, nodules/masses, and atelectasis) with additional annotation of any report text documenting laterality and size. This approach mirrors the clinical heuristic used by expert radiologists and pulmonary/critical care clinicians when assessing the likelihood that a chest radiograph indicates the presence of ARDS. In our annotation scheme the document labels, infiltrates and extraparenchymal, are the best indicators of ARDS, and the relation annotations are included to support the document labels. Annotation Scoring and Evaluation: The goal of this work is to extract salient information from chest radiograph reports and convert it to a structured representation that will complement other types of structured clinical data (e.g. PaO3/FiO2 ratio) to predict ARDS. The assessment of annotator agreement and extraction performance focuses on the information in the annotation schema that is most relevant to the large-scale, automated assessment of ARDS. For each entity, the subtype label captures the important span information, such that the associated text span is less informative. Figure 2 presents the same sentence annotated by two annotators. Both annotators label a region entity with subtype parenchymal that is connected to a side entity with subtype unilateral. Although the annotators label different spans for the region entities ("midlung, basilar opacities" vs. "opacities"), both annotations identify unilateral parenchymal opacities (i.e. opacities in one lung). For the purposes of predicting ARDS, these annotations are equivalent, even though there are span differences. For entities with equivalent type and subtype labels, the spans are evaluated under two criteria: any overlap and partial match. Under the any overlap criterion, spans are considered equivalent if there is at least one overlapping token, and the performance is assessed based on span counts. For the region annotation in Figure 2 , the entity spans "midlung, basilar opacities" and "opacities" overlap, so there is one matching span. Under the partial match criterion, spans are compared at the token level to allow partial matches, and performance is assessed based on the number of matching tokens. For the region annotation in Figure 2 , the entity spans have one matching token ("opacities") and three mismatched tokens ("midlung , basilar"). There is only one relation type (attribute or attr), and two relations are equivalent if the entities paired by the attribute relation are equivalent under the any overlap criterion. Performance is evaluated using precision (P), recall (R), and F1-score (F1). Annotator Agreement: As part of the annotation guideline development and annotator training, we doubly annotated 20 reports, assessed inter-annotator agreement, updated the annotation guidelines, and provided additional annotator training. At the conclusion of the project, we doubly annotated 10 additional reports, to assess the agreement for the annotated corpus The agreement for these 10 reports is presented in Table 2 . The agreement for both document labels is high (0.90 F1). The entity agreement under the any overlap criteria is also high (0.85-0.96 F1). The entity agreement using the partial match criteria remains high for negation and side entities; however it is relatively low for region. These results suggest the annotators are generally labeling the same phenomenon; however, they differ in the selected spans, similar to the example in Figure 2 . Relation agreement is very high for region-negation entity pairs (0.96 F1) and lower for region-side pairs (0.76 F1). While the region span annotations are noisy, the entity and relation annotations, still contain useful information for assessing ARDS. The likelihood of a patient satisfying the chest imaging requirements for the Berlin definition of ARDS can be estimated from the document labels, infiltrates and extraparenchymal. We introduce the Hierarchical Attention Network with Sentence Objectives (HANSO) framework in Figure 4 , to predict these document labels and incorporate the re- lation annotation information. HANSO is an neural end-to-end, multi-task model that includes sentence encoding and document encoding layers. It builds on Yang's hierarchical attention network (HAN). 16 HAN aggregates word-level information to the sentence-level and then aggregates sentence-level information to the document-level. Sentences are encoded using a multi-layer network consisting of a recurrent neural network (RNN) and self-attention, and documents are encoded using separate RNN and self-attention layers operating on the encoded sentences. We build on HAN, incorporating sentence-level prediction tasks to augment the learning of the sentence representations. The sentence targets are derived from the relation annotations. The sentence and document encoding layers in Figure 4 are implemented for each of the document labels, infiltrates and extraparenchymal, with a shared input recurrent layer. HANSO omits the additional RNN included in the document encoding layer of HAN. In our initial experimentation, we implemented a span-based relation extraction model for the entity and relation labels, similar to our previous work, 17 and tried using the extracted relation information to improve the prediction of the document labels. However, the entity and relation extraction performance was insufficient to improve the prediction of the document labels, which are the most important labels in our annotation schema. Contributing factors to the low entity and relation extraction performance include the small data set size and the variability in the annotated spans, especially for region entities. The subtype labels in our annotation scheme (e.g. unilateral and bilateral for the side entities) normalize the span information, so the noisiness in the annotated spans does not negatively impact the informativeness of the annotations. However, this noise in the span annotation does negatively impact model learning and span prediction. To utilize the detailed relation information, the relations are converted to a one-hot encoding for each sentence, capturing the salient relation and entity information without explicitly identifying entity spans. Figure 5 presents examples of this relations-to-sentence label mapping process. Relations consisting of region-side pairs are represented as the subtype label pairs: {parenchymal-unilateral, parenchymalbilateral, extraparenchymal-unilateral, extraparenchymal-bilateral}. Relations consisting of region-negation pairs are represented as the region subtype label and negation type: {parenchymal-negation, extraparenchymal-negation}. While this approach does not explicitly capture span information, this sentence-level encoding of the relations creates a summary of the most important annotated phenomena within each sentence. The size entities are infrequent within the corpus and are omitted from experimentation. In the description of the HANSO framework below, the subscripts i, j, and k indicate the i th BERT word piece position, j th sentence, and k th document. We only include the i, j, and k subscripts below that are needed to resolve ambiguity. Input encoding: Input documents are split into sentences and tokenized using spaCy. 18 The default en core web sm Figure 5 : Relation-to-sentence label mapping example spaCy configuration is used, except that line breaks always indicate a sentence boundary. Sentences are mapped to contextualized word embeddings using Bio+Clinical BERT. 19 The contextualized BERT word piece embeddings feed into a bi-LSTM without fine tuning BERT (no backpropagation to BERT). We tried several different architectures involving fine tuning BERT but these architectures did not out perform HANSO, which is likely due to the very small training set. The forward and backward states of the bi-LSTM are concatenated to form h i with size 1 × l h , where i is the word piece position. Sentence encoding: Each sentence is represented as the attention-weighted sum of the word piece vectors. The bi-LSTM hidden state for each word piece is nonlinearly projected from size l h to l p as where W u is weight matrix and b u is a bias vector. Word-level attention weights, α u , are calculated using dot product attention as where u i is the projected word piece input and z u is a learned vector of size l p . The representation of sentence j is calculated as where s j has size l h . A set of binary sentence-level prediction tasks, R, are incorporated based on the one-hot encoding of the relations described above. For each sentence-level task, r ∈ R, the sentence vector is nonlinearly projected from size l h to l p as v r,j = tanh(W v,r s j + b v,r ). Label scores for task r are calculated using a linear projection from size l p to 2 as where ψ r,j are the label scores for task r and sentence j. Document encoding: The same attention framework defined in Equations 1 and 2 is used to calculate the sentencelevel attention weights, α s . Separate attention weights and bias vectors are learned for the sentence and document encoders. The sentence representation is nonlinearly projected from size l h to l p as Each document is represented as the attention weighted sum of the projected sentence vectors as where α s,j is the attention weight for sentence j and d k has size l p . The classes for the infiltrates and extraparenchymal labels are {none, present, unilateral, bilateral}. Document label predictions are generated by linearly projecting the document vector from size l p to l d as where φ k are the label scores for document k and l d is the document label set size. Separate sentence and document encoders are implemented for the infiltrates and extraparenchymal document labels, utilizing a shared bi-LSTM input layer. For the prediction of the infiltrates document label, the binary sentence prediction targets include: parenchymal-unilateral, parenchymal-bilateral, and parenchymal-negation. For the prediction of the extraparenchymal document label, the binary sentence prediction targets include: extraparenchymal-unilateral, extraparenchymal-bilateral, and extraparenchymal-negation. To assess the contributions of the sentence-level objectives to document prediction performance, we implement HANSO without the sentence-level learning objective in Equation 5 ("HANSO lite") and the full HANSO model ("HANSO full"). HANSO is trained on the PAC training partition and evaluated here using two approaches: text-versus-text and textversus-image. In the text-versus-text approach, the trained HANSO model is evaluated on the withheld PAC test set. The text-versus-text approach is a typical NLP performance evaluation, where the model is trained and evaluated on annotated text. In the text-versus-image approach, we apply HANSO to a set of chest radiograph reports for which the associated chest radiograph images are directly annotated by an expert radiologist. The text-versus-image approach compares labels derived from text reports against gold standard image annotations. In this subsection, we evaluate model performance on the withheld PAC test set. We present performance results for three models: SVM, HANSO lite, and HANSO full. To account for the variance associated with model random initialization, each model is trained on the training set 10 times and evaluated on the test set to generate a distribution of performance values. Table 3 presents the average performance across the 10 runs for the document labels, infiltrates and extraparenchymal. Performance is presented for each label and the micro average across labels ("micro"). HANSO full achieves the best overall performance (F1 micro) for both infiltrates and extraparenchymal with significance (p < 0.05), demonstrating that the inclusion of the sentence objectives contributes to document prediction performance. HANSO full achieves a statistically significant improvement in identifying bilateral infiltrates relative to the SVM (p < 0.01), and the improvement of HANSO full over HANSO lite barely misses significance criteria (p = 0.07). Significance is assessed using a two-side t-test with unequal variance. The average performance across all sentencelevel tasks in the HANSO full runs is 0.83 F1. Considering the sentence-level tasks represent the summarization of the most salient relation information without including any span information, this performance is high. To assess the receiver operating characteristic (ROC) for HANSO's ability to identify bilateral infiltrates, we train a separate HANSO model (HANSO full) with binary document targets (0 = not bilateral and 1 = bilateral), where the none, present, and unilateral labels are mapped to not bilateral. This binary variant achieves similar performance (0.88 F1) in identifying bilateral infiltrates as the multi-class models in Table 3 . Figure 6 presents the ROC for bilateral infiltrates identification using the binary HANSO model. The ROC area under the curve (AUC) is 0.92. Optimizing the prediction threshold using Youden's J statistic yields J = 0.77 at F P R = 0.08 and T P R = 0.84. Table 4 : Comparison of radiograph image labels and radiograph report labels for infiltrates In this subsection, we compare the manual PAC annotations and automatically generated HANSO labels against annotated chest radiograph images. An expert radiologist annotated 154 chest radiograph images from Data set A with quadrant-level consolidation scores that can be mapped to the infiltrates labels none, unilateral, and bilateral, which we treat as the gold standard labels in this section. The annotated radiograph images correspond with 44 annotated reports in PAC (35 train and 9 test). The manual report labels are evaluated for the 44 reports in PAC that have a corresponding annotated images. The HANSO labels are evaluated for the 119 (154-35) annotated radiograph images not associated with reports in the PAC train set. Table 4 presents the performance of the manual and HANSO infiltrates labels. While the performance for none and unilateral is lower for both the manual and HANSO labels, the performance for bilateral is high for both the manual labels (0.84 F1) and HANSO labels (0.87 F1). ARDS diagnosis requires the presence of bilateral infiltrates, so the bilateral label performance is most important. HANSO achieves a sensitivity (recall) of 0.85 and specificity of 0.75 for bilateral in a one-versus-rest evaluation (bilateral vs. not bilateral). As the manual and HANSO performance is assessed using different samples, the significance of the performance differences cannot be assessed. We introduce a new annotated corpus of chest radiograph reports, PAC, which includes document, entity, and relation annotations associated with ARDS. We also introduce the multi-task, end-to-end HANSO classification framework, which hierarchically encodes documents by encoding the word in sentences and the sentences in a documents. Interannotator agreement for the PAC document labels is high (0.90 F1). The agreement for the entities indicates the annotators are generally identifying the same phenomenon in the chest radiograph reports ("any overlap" agreement 0.85-0.96 F1), although there is variability in the bounds of the annotated spans for the entities ("partial match" agreement 0.49-0.92 F1). The annotation scheme defines entities with type and subtype labels that normalize each span and capture the information most relevant to ARDS, so the variability in the bounds of the span annotations does not materially impact the clinical meaning of the annotations. However, this variability makes span extraction (entity recognition) challenging. To leverage the entity and relation annotations, we introduce an approach for mapping relations to a one-hot encoding of the entity pairs in each relation and use this one-hot encoding of the relations to create a set of sentence classification tasks. The one-hot encoding captures the most important annotated relation information, without requiring the prediction of entity spans. The primary objective of the HANSO framework is the prediction of the document labels associated with ARDS; however, HANSO includes a secondary objective associated with the prediction of the sentence-level one-hot encodings of the relations. The inclusion of the sentence-level objective increases performance, with significance, for the document labels; infiltrates increases from 0.71 to 0.79 F1, and extraparenchymal increases from 0.74 to 0.80 F1. The presence of bilateral infiltrates is predicted with very high performance (0.87 F1). HANSO predicts the one-hot encoded relations with high performance (0.83 F1), indicating the model is able to identify the key information from the fine grained annotations without explicit knowledge of span information. HANSO also outperforms a strong SVM baseline from recent ARDS information extraction work, with significance. We also assess the performance of manual (human) and automatic HANSO chest radiograph reports labels relative to annotated chest radiograph images. In the identification of bilateral infiltrates, HANSO achieves high performance against the annotated images (0.87 F1), which is comparable to the human performance (0.84 F1). ARDS is a common complication of COVID-19 infection with high mortality; however, the identification of ARDS is often delayed or missed entirely. Delays in diagnosis lead to delays in evidence-based therapies that can improve clinical outcomes in COVID-associated ARDS. To our knowledge, this is the first study in COVID-19 that uses radiology reports to develop an automated NLP algorithm for identifying ARDS. These algorithms could be implemented in EHRs for the real-time surveillance of ARDS in COVID-19 infected populations with the goal of ensuring early implementation of evidence-based strategies for decreasing ARDS mortality. As next steps, we will complete an external validation of the data set and modeling framework to prepare for the deployment of an ARDS diagnostic tool. Specifically, the developed HANSO algorithm and trained model will be incorporated into a larger diagnostic tool that will be released and validated within the Electronic Medical Records and Genomics (eMERGE) Network to support pulmonary phenotyping efforts across different sites. COVID-19 in critically ill patients in the Seattle region-case series acute respiratory distress syndrome Network. Ventilation with lower tidal volumes as compared with traditional tidal volumes for acute lung injury and the acute respiratory distress syndrome National Heart, Lung, and Blood Institute Acute Respiratory Distress Syndrome (ARDS) Clinical Trials Network. Comparison of two fluid-management strategies in acute lung injury Prone positioning in severe acute respiratory distress syndrome Epidemiology, patterns of care, and mortality for patients with acute respiratory distress syndrome in intensive care Units in 50 countries Timing of low tidal volume ventilation and intensive care unit mortality in acute respiratory distress syndrome. A prospective cohort study Acute respiratory distress syndrome the Berlin definition Deep learning for natural language processing in radiology-fundamentals and a systematic review Pneumonia identification using statistical feature selection Identification of patients with acute lung injury from free-text chest x-ray reports A computable phenotype for acute respiratory distress syndrome using natural language processing and machine learning External validation of an acute respiratory distress syndrome prediction model using radiology reports Understanding spatial language in radiology: representation framework, annotation, and spatial relation extraction from chest x-ray reports using deep learning Towards reliable ARDS clinical decision support: ARDS patient analytics with free-text and structured EMR data BRAT: a web-based tool for NLP-assisted text annotation Hierarchical attention networks for document classification Extracting COVID-19 diagnoses and symptoms from clinical text: A new annotated corpus and neural event extraction framework. arXiv preprint. under review spaCy: industrial-strength natural language processing in Python. Zenodo Publicly available clinical BERT embeddings PyTorch: an imperative style, high-performance deep learning library Scikit-learn: machine learning in Python This work was supported by NIH/NHGRI (1U01 HG-008657), NIH/NLM Biomedical and Health Informatics Training Program (5T15LM007442- 19) , and NIDDK K23DK116967 (PKB). We want to acknowledge Sudhakar Pipavath, MD, for his contributions to the annotation of the chest radiograph images used in this study. Research and results reported in this publication was partially facilitated by the generous contribution of computational resources from the University of Washington Department of Radiology.