key: cord-103363-efd80dgn authors: Mahan, Margaret; Rafter, Daniel; Casey, Hannah; Engelking, Marta; Abdallah, Tessneem; Truwit, Charles; Oswood, Mark; Samadani, Uzma title: tbiExtractor: A framework for extracting traumatic brain injury common data elements from radiology reports date: 2019-03-21 journal: bioRxiv DOI: 10.1101/585331 sha: doc_id: 103363 cord_uid: efd80dgn Objective The manual extraction of valuable data from electronic medical records is cumbersome, error-prone, and inconsistent. By automating extraction in conjunction with standardized terminology, the quality and consistency of data utilized for research and clinical purposes would be substantially improved. Here, we set out to develop and validate a framework to extract pertinent clinical conditions for traumatic brain injury (TBI) from computed tomography (CT) reports. Materials and Methods We developed tbiExtractor, which extends pyConTextNLP, a regular expression algorithm using negation detection and contextual features, to create a framework for extracting TBI common data elements from radiology reports. The algorithm inputs radiology reports and outputs a structured summary containing 27 clinical findings with their respective annotations. Development and validation of the algorithm was completed using two physician annotators as the gold standard. Results tbiExtractor displayed high sensitivity (0.92-0.94) and specificity (0.99) when compared to the gold standard. The algorithm also demonstrated a high equivalence (94.6%) with the annotators. A majority of clinical findings (85%) had minimal errors (F1 Score ≥ 0.80). When compared to annotators, tbiExtractor extracted information in significantly less time (0.3 sec vs 1.7 min per report). Discussion and Conclusion tbiExtractor is a validated algorithm for extraction of TBI common data elements from radiology reports. This automation reduces the time spent to extract structured data and improves the consistency of data extracted. Lastly, tbiExtractor can be used to stratify subjects into groups based on visible damage by partitioning the annotations of the pertinent clinical conditions on a radiology report. 121 Fig 1. Graphical outline of the methods. Purple rectangle shapes correspond to methods subsections, 122 meaning they represent steps in the processing workflow, orange parallelogram shapes represent data, 123 blue diamond shapes represent binary decisions on data, gray rectangle shapes represent excluded data, 124 and green isosceles trapezoid shapes correspond to subcomponents of the algorithm. 125 126 Data Capture and Cleaning Hospital admission radiology reports from non-contrast head CT scans were extracted from 128 EMRs for subjects participating in the CLASSIFY-TBI study (details in S1 Protocol). Each radiology 129 report was converted to a spaCy [26] container for assessing linguistic annotations and partitioned into 130 sentences. Sentences before "Findings" and after "Impressions" sections were removed. Then, the 131 sentences were concatenated with newline characters replaced with a space, symbols removed, and 132 whitespace stripped. Radiology reports that did not contain "Findings" or "Impressions" sections were 133 removed along with radiology reports containing multiple scan types. Using scikit-learn [27] TfidfVectorizer, the corpus was converted into a matrix of TF-IDF 137 (term-frequency times inverse document-frequency) features using n-grams with n-range from one to 138 ten. Cosine similarities were calculated between each pair of radiology reports by multiplying the TF-139 IDF matrix by its transpose. Using the cosine similarity for each pair of radiology reports, one 140 radiology report was randomly selected and all radiology reports with at least 0.70 cosine similarity to 141 that radiology report were collected in a set. From this set, one radiology report was randomly selected 142 to keep for further analysis and the remainder were removed. This was applied recursively for each set 143 until each radiology report was retained for further analysis or marked for removal. The purpose of this 144 removal was to reduce the data requiring human annotation. Details in S2 Appendix. 146 Dataset Partitioning A random deck of three numbers the same size as the number of radiology reports retained for 148 analysis was created. The three numbers represented the proportion of radiology reports to be assigned 149 to each of the datasets: 10% initialization, 40% training, and 50% validation. From the set of radiology 150 reports retained for analysis, one radiology report was randomly selected along with up to three most 151 similar radiology reports, based on cosine similarity. From this subset, each radiology report was 152 assigned the next number in the shuffled deck. This was applied recursively until each radiology report 153 was assigned to one dataset. The initialization dataset was solely used for training annotators and was not used by the As a lexicon-based method, pyConTextNLP inputs tab-separated files for lexical targets 187 (indexed events) and lexical modifiers (contextual features). It then converts these into itemData, which 188 contains a literal, category, regular expression, and rule (the latter two are optional). The literal, 189 belonging to a category (e.g., ABSENT), is the lexical phrase (e.g., is negative) in the text. The regular 190 expression allows for variant text phrases (e.g., was negative) giving rise to the same literal and is 191 generated from the literal if not provided. Further, the rule provides context to the span of the literal 192 (e.g., backward). For text data, pyConTextNLP marks the text with lexical modifiers and lexical targets 194 according to their representative itemData. The pyConTextNLP algorithm outputs a directional graph 195 via NetworkX[29] which represents these markups. Nodes in the graph represent the concepts (i.e., 196 lexical modifiers and lexical targets) in the text and edges in the graph represent the relationship 197 between the concepts. The following three subsections will describe the details used for extending pyConTextNLP. ABSENT, NORMAL, and ABNORMAL. Henceforth, the term "annotation" will be used when 209 referencing the category to maintain consistency between annotators and algorithm vocabulary. Lexical targets were adapted from the common data elements in radiologic imaging of TBI [3] . 219 In deriving the lexical targets, the literal represents a clinical condition relevant to TBI on a non-220 contrast head CT scan (e.g., microhemorrhage) and the category, in this study, is the same (e.g., 221 MICROHEMORRHAGE). The regular expression for each literal (e.g., microhemorrhage(s)?) 222 was added and updated during the training stage. Two examples (Fig 2 and Fig 3) are provided for detailed explanation of the application of 224 lexical modifiers and lexical targets during the algorithm process. At this stage of processing, each sentence in the radiology report will be marked with lexical 245 targets and linked lexical modifiers. There will be one lexical modifier assigned to one lexical target. There were 438 radiology reports extracted: 1 was removed because it did not have both 298 "Findings" and "Impressions" sections, 20 were removed because they contained more than one scan 299 type, and 106 were removed for high cosine similarity. The remaining 311 reports were split into 300 initialization, training, and validation datasets (Table 1) . In the training dataset, annotators took an average of 2.84 minutes per radiology report. 310 Between 15% and 16% of annotations across radiology reports were selected from default (Table 2) . In the validation dataset, annotators took an average of 1.67 minutes per radiology report. 318 Similar to the training dataset, 16% of annotations across radiology reports were selected from default 319 (Table 2) . For the validation dataset, there was high equivalence in annotations between the annotators 320 (N = 4072), with an additional 598 similar annotations, and only 87 divergent annotations. Overall, the 321 two annotators were in high agreement (κ = 0.913). 332 equivalent was considered the gold standard (Fig 4 dashed line) . The evaluation revealed high 333 performance across all metrics (Table 3) . Six false positives were produced for intracranial pathology and four for hemorrhage (NOS), 366 meaning tbiExtractor identified these lexical targets as PRESENT, while the annotators marked these 367 as ABSENT. This is due to the derivation of these lexical targets in relation to other lexical targets (i.e., 368 if extraaxial fluid collection is PRESENT, then by decision rules, so is intracranial pathology). The 369 remaining lexical targets produced less minimal false positives. Overall, the errors are minimal as 370 measured by the high F1 scores for the majority of lexical targets. Further examination of divergent cases (i.e., annotators annotated ABSENT and tbiExtractor 372 annotated PRESENT, or vice versa) revealed the most common diverged lexical targets to be 373 intracranial pathology, facial fracture, intraparenchymal hemorrhage, hemorrhage (NOS), and 374 herniation. The remaining lexical targets exhibited less than four diverged responses. The most the derived-from-376 decision-rules intracranial pathology, indicating that most errors were from tbiExtractor missing the 377 lexical targets outright. In most divergent cases where this was not the reason, there were more 378 complex structures to the sentences First, cosine similarities across the four subsets of data were not different and indicated a 388 normal, albeit slender, distribution of radiology report similarity. Second, the average number of 389 sentences in each radiology reports approached the minimum, indicating a skewed right distribution 390 where the majority of radiology reports will have low numbers of sentences. The same holds true for 391 the number of words. Taken together, this could be reflective of the findings generally found in CT 392 reports on TBI subjects, where the prevalence of CT findings is less than 10% in mild TBI cases In cases where annotators were not equivalent, 396 data entry issues tended to be the culprit. Mostly, this was a result of overlooking the lexical target and 397 not selecting an annotation different from default. The overlooking could be a result of annotator 398 fatigue, which may be attributed to length and/or complexity of the radiology report There was also a difference in whether "parenchymal contusion" was considered an intraparenchymal 404 hemorrhage. However, the differences between the annotators was minimal and therefore provided a 405 valid gold standard to develop and validate tbiExtractor Standard assessment metrics for evaluating tbiExtractor were exceptionally high, demonstrating 407 the utility of the algorithm for extracting accurate clinical conditions relevant to TBI research One particularly error-prone case was facial fracture. Often, 412 radiology reports with facial fractures are lengthy and involve compound sentence structures, which 413 are missed by the regular expressions and span pruning. Second, there were several cases where the 414 lexical modifier was absent or at a distance further away than another lexical modifier. For example, 415 "cerebellar volume loss" would indicate atrophy is PRESENT, but with this sentence, there is no 416 lexical modifier available and therefore would result in a default lexical modifier of ABSENT. Third, 417 there were cases where derived lexical targets were not accurately annotated by tbiExtractor extraaxial fluid collection to be ABSENT. Further examination of these errors is an avenue for future The dataset used for this study was from a single institution which limits the style of 429 radiology reports and decreases heterogeneity in the sample. Furthermore, the dataset was limited in 430 size as there were only two annotators available for annotation. In addition, there were data entry issues 431 from extracting the radiology report from the EMRs. For example, a subsequent radiology report was 432 used instead of the admission. Lastly, the only scan considered in this dataset is the admission non-433 contrast head CT. With the nature of TBIs, some visible pathologies are only seen on follow-up CTs developed to automate the extraction of TBI common data elements from 438 radiology reports Multivariable prognostic analysis in traumatic brain injury: 505 Results from the IMPACT study Optimizing clinical research participant selection with informatics Common data elements in radiologic imaging of traumatic brain 511 injury Common data elements in radiology The diagnosis of head injury requires a classification based 516 on computed axial tomography Prediction of outcome in traumatic brain injury with 519 computed tomographic characteristics: A comparison between the computed tomographic classification 520 and combinations of computed tomographic predictors What can natural language processing do for clinical 523 decision support Natural language processing in pathology: A systematic review Natural language processing technologies in radiology research 529 and clinical applications Natural language processing: An introduction Automated encoding of clinical documents based on natural 535 language processing Medication extraction from electronic clinical notes in an 538 integrated health system: A study on aspirin use in patients with nonvalvular atrial fibrillation Extracting principal diagnosis, co-morbidity and smoking status 542 for asthma research: Evaluation of a natural language processing system Identifying primary and recurrent cancers using a SAS-based natural 546 language processing algorithm Automated outcome classification of emergency department 550 computed tomography imaging reports Automated outcome classification of computed tomography 553 imaging reports for pediatric traumatic brain injury Information extraction from multi-institutional radiology reports Natural language processing techniques for extracting and 560 categorizing finding measurements in narrative radiology reports A natural language processing pipeline for pairing 564 measurements uniquely across free-text CT reports Assessing the feasibility of an automated suggestion 568 system for communicating critical findings from chest radiology reports to referring physicians Characterization of change and significance for clinical findings in 572 radiology reports through natural language processing Python Software Foundation Data structures for statistical computing in python A guide to NumPy Open source scientific tools for Python Scikit-learn: Machine learning in python Exploring network structure, dynamics, and function using NetworkX. 593 Proceedings of the 7th Python in A 2D graphics environment A simple algorithm for identifying negated findings and 601 diseases in discharge summaries ConText: An algorithm for identifying contextual features from 604 clinical text Context: An algorithm for determining negation, 607 experiencer, and temporal status from clinical reports Document-level classification of CT pulmonary angiography reports 611 based on an extension of the ConText algorithm Inter-Coder agreement for computational linguistics A coefficient of agreement for nominal scales Imaging evidence and recommendations for traumatic brain 620 injury: Conventional neuroimaging techniques Prospective validation of a proposal for diagnosis and 624 management of patients attending the emergency department for mild head injury Diagnostic procedures in mild traumatic brain injury: results of the 628 WHO collaborating centre task force on mild traumatic brain injury Mild head injury -mortality and complication rate: Meta-analysis of 632 findings in a systematic literature review Indications for computed tomography in patients with minor head 635 injuries Epidemiology of traumatic brain injury Congenital and acquired brain injury. 1. Epidemiology, 641 pathophysiology, prognostication, innovative treatments, and prevention. Archives of Physical 642 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501