key: cord-0283793-7mtqw3ur authors: Schirle, L.; Jeffery, A. D.; Yaqoob, A.; Sanchez-Roige, S.; Samuels, D. C. title: Two Data-Driven Approaches to Identifying the Spectrum of Problematic Opioid Use: A Pilot Study within a Chronic Pain Cohort date: 2021-09-12 journal: nan DOI: 10.1101/2021.09.07.21263079 sha: 6c9eeb88346481071880d2c118ea5ea45972f9d7 doc_id: 283793 cord_uid: 7mtqw3ur Background: Although electronic health records (EHR) have significant potential for the study of opioid use disorders (OUD), detecting OUD in clinical data is challenging. Models using EHR data to predict OUD often rely on case/control classifications focused on extreme opioid use. There is a need to expand this work to characterize the spectrum of problematic opioid use. Methods: Using a large academic medical center database, we developed 2 data-driven methods of OUD detection: (1) a Comorbidity Score developed from a Phenome-Wide Association Study of phenotypes associated with OUD and (2) a Text-based Score using natural language processing to identify OUD-related concepts in clinical notes. We evaluated the performance of both scores against a manual review with correlation coefficients, Wilcoxon rank sum tests, and area-under the receiver operating characteristic curves. Records with the highest Comorbidity and Text-based scores were re-evaluated by manual review to explore discrepancies. Results: Both the Comorbidity and Text-based OUD risk scores were significantly elevated in the patients judged as High Evidence for OUD in the manual review compared to those with No Evidence (p = 1.3E-5 and 1.3E-6, respectively). The risk scores were positively correlated with each other (rho = 0.52, p < 0.001). AUCs for the Comorbidity and Text-based scores were high (0.79 and 0.76, respectively). Follow-up manual review of discrepant findings revealed strengths of data-driven methods over manual review, and opportunities for improvement in risk assessment. Conclusion: Risk scores comprising comorbidities and text offer differing but synergistic insights into characterizing problematic opioid use. This pilot project establishes a foundation for more robust work in the future. Despite aggressive increases in opioid epidemic research funding [1] , U.S. opioid overdose deaths continue to rise [2, 3] . Retrospective observational studies are valuable research tools for examining epidemiology, disease progression, and treatment effectiveness [4] , however, their use is hampered in opioid use disorder (OUD) research due to difficulties in OUD detection in Electronic Health Records (EHR) data. Providers are often reluctant to document concerns about opioid use in health records due to the stigmatizing nature of diagnoses, potential difficulties in future pain management, fear of misclassification, and poorly defined diagnostic criteria [5] [6] [7] [8] [9] [10] . Therefore, standard approaches for identifying cases in EHR data, such as International Classification of Diseases (ICD) codes or problem lists, insufficiently capture OUD [11] [12] [13] . Several existing methods have utilized EHR data for OUD prediction [14] [15] [16] [17] [18] [19] [20] . Some models identify OUD cohorts using ICD codes [16] [17] , but this approach likely underrepresents problematic opioid use [13] . Other models used unstructured clinical notes text [14, 15, 18, 19] . Although useful, both methods characterize problematic opioid use in a binary fashion, missing nuanced problematic opioid use that occurs not in a present/absent dichotomy but on a continuum of severity. Other studies employ models that do capture the continuum of problematic opioid use outside of EHR data. For example, one study used machine learning methods to produce a continuous measure of OUD risk in Medicare data purchased through the Centers for Medicare and Medicaid (CMS) [22] . These studies provide indispensable insight into the continuum of problematic opioid use, but come at the expense of poor reproducibility in most EHR systems [13, [20] [21] [22] . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 12, 2021. ; https://doi.org/10.1101/2021.09.07.21263079 doi: medRxiv preprint To overcome this limitation, a recent study employed machine learning approaches to produce a continuous measure of problematic opioid use risk using readily accessible inpatient EHR data [23] . Our study expands this foundational work by employing two data-driven methods to assess the continuum of problematic opioid use using different data sources (ICD codes and clinical notes) in a large sample of readily available EHR data comprising all encounters for chronic pain patients. The first method uses a phenome wide association study (PheWAS) of phenotypes significantly associated with OUD ICD codes to produce an OUD comorbidity risk score. The second method uses natural language processing (NLP) to produce a text-based score to identify OUD-related concepts in available clinical notes. Methods to detect the continuum of problematic opioid use in readily available EHR data would significantly enhance the ability to conduct retrospective opioid research critical to improving OUD detection and treatment. The objective of this study was to evaluate whether combining data-driven OUD comorbidities and EHR text could serve as a new framework to identify a continuum of problematic opioid use. Material and Methods Data from Vanderbilt University Medical Center's de-identified biorepository, BioVU, linked to over 20 years of clinical records [24] , was extracted between 8/2018 through 6/2021. We developed the ICD-based OUD comorbidity score and text-based Concept Unique Identifier (CUI) score in independent BioVU participant subgroups. To develop the text-based score and evaluate final performance, we chose individuals with a diagnosis of chronic pain due to a higher incidence of opioid use and OUD than in general populations [25] . We evaluated our methods against gold-standard manual review in a holdout test set (Figure 1 ). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint For the OUD PheWAS, we used a cohort of Caucasian BioVU participants (N=29,868), as this available dataset was created for genetics analyses, independent of analyses reported here. We required a minimum age at end of the medical record of 20 years, and minimum 3- year length of record [26, 27] . Minimum length of record is a common PheWAS practice to improve data depth for each individual [28, 29] . This cohort was 42% male, with average record length of 12 years (IQR 7.5 -15.4 years) and average age at end of medical record of 59 years. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 12, 2021. ; https://doi.org/10.1101/2021.09.07.21263079 doi: medRxiv preprint OUD was defined by presence of relevant ICD9 or ICD10 codes (Supplemental Table 1 ). Individuals with at least one OUD ICD code were classified as a PheWAS OUD case. In this cohort, the OUD rate was 2.1%. PheWAS phenotype categories were defined through presence of ICD codes as defined in Wei et al. [30] . Phenotype categories were tested for association with OUD by logistic regression adjusting for sex, age at final record, record duration, and the first 5 genetic principal components to adjust for population substructure. We tested 1,356 PheWAS categories in separate logistic models. All calculations were carried out in R version 3.6.3. Table 1 lists the top 24 association phenotypes, sorted by p-value. The phenotype most significantly associated with OUD was "Substance Addiction and Disorders", a broad category including the ICD codes used to define OUD. Since our intention was to define a comorbidity score for OUD, we excluded these phenotype categories and all other substance use disorder categories ( Table 1 ). The remaining top 20 associated phenotypes were used to define the comorbidity score. Interestingly, all phenotypes in this score were either pain or mental illness phenotypes. All ICD9 and ICD10 codes mapping to the 20 PheWas codes in Table 1 were extracted from all subjects in BioVU. Each person was classified as a case or control for these 20 phenotypes. The comorbidity score was calculated as a weighted linear sum over these 20 phenotypes using the beta values for the PheWAS as weights. The maximum comorbidity score in this cohort (51.509) was used to normalize the comorbidity scores to range from 0 to 1. To test reproducibility and transferability of the PheWAS results (Table 1) to other independent cohorts and other racial groups, we repeated the association of these phenotypes, defined by lists of ICD codes (Supplemental table 1), in two additional cohorts. First, we repeated association tests in an independent cohort of 13,508 Caucasians over age 20 at the end of their medical record and with at least 3 years of medical record. All 20 tested phenotype associations with OUD replicated in this additional cohort with significant associations (minimum . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. We extracted clinical notes from the time period of 30 days before the patient's first ICD-9 code related to chronic pain (i.e., 338.2, 338.21, 338.22, 338.28, 338.29) through 30 days after the last ICD-9 code related to chronic pain. We excluded patients with only 1 ICD-9 code related to chronic pain. For computational feasibility, we restricted notes based on Observational Medical Outcomes Partnership (OMOP) note type identifiers 44814645 ("Note") and 44814640 ("Outpatient Note") and further restricted to those notes containing words related to variations of: pain, opioid, expansions of narcot-, and expansions of addict-. We processed the resulting 308,264 notes using ScispaCy, which is an open-source natural language processing algorithm that modifies routine natural language processing to accommodate biomedical text [31] . We used ScispaCy for sentence detection, abbreviation expansion, namedentity recognition, and negation detection [32, 33] . Following recognition of named entities, we used ScispaCy's EntityLinker component to map entities (i.e., words and phrases) to the Unified Medical Language System (UMLS) standardized vocabulary's concept unique identifiers (CUIs). As an example, imagine a brief . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 12, 2021. ; https://doi.org/10.1101/2021.09.07.21263079 doi: medRxiv preprint clinical note: "Patient presents for acute pain in R knee. No history of opioid abuse. Prescribing oxycodone." ScispaCy would separate this note into 7 named entities that map to CUIs. The resulting representation would be: C0030705 (patients), C0184567 (acute onset pain), C0230431 (structure of right knee), C0332122 (no history of-negated) C0029095 (opioid abusenegated), C0278329 (prescribed), and C0030049 (oxycodone). In addition to routine stop-words (e.g., it, the, and), we removed 17 ambiguously mapped concepts (e.g., the word "met" in the frequent context of "goals met" was mapped to C0025646 for "methionine") (Supplemental Table 3 ). To represent the relative importance of a concept for a patient's corpus of notes, we calculated Term Frequency-Inverse Document Frequency (TF-IDF) scores for each non-negated CUI for each patient [28, 29] . We explored CUIs with the 50 highest TF-IDF values from both patients labeled as cases from ICD codes (Supplemental Table 1 ), and those labeled as controls. Subject matter experts (L.S., a nurse anesthetist and opioid researcher, and S.S-R., a substance use disorder and genetics researcher) compared top CUIs found only in cases versus top CUIs found only in controls. Table 2 lists top-scoring CUIs for cases, along with whether subject matter experts identified the concept as valid. Top CUI scores included terms such as 'methadone' and 'Suboxone'. For each CUI identified as valid, we added 1 point when a patient's TF-IDF value for that CUI was larger than the mean TF-IDF value across all patients. We performed a similar process for CUI controls (Supplemental Table 4 ). However, the inclusion of control data increased noise, and score performance decreased considerably. Therefore, we removed control data from the scores. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint To determine evidence of problematic opioid use, a randomly-selected subset of 100 patients from a holdout set of the chronic pain cohort underwent manual record review for comparison with data-driven methods. One record did not contain sufficient data to calculate a text-based score and was excluded from manual review. We reviewed records using a keyword template developed from keywords in the Diagnostic and Statistical Manual of Mental Disorders, 5 th Ed. (DSM V) criteria for OUD [34] , the Addiction Behaviors Checklist [35] , and previous studies describing problematic opioid use detection in EHRs [15, 19, 20] . Periodic interim analyses assessed word performance, and we trimmed duplicate words (i.e. "detox" and "tox screen" to "tox", and "multiple providers" and "multiple prescribers" to "multiple pr"). Supplemental Table 5 We conducted statistical analyses in R v3.6.3 and in Python 3.8.5. We used Spearman's rho rank correlation coefficient to examine correlations between each scoring system and manual review as well as the correlation between scoring systems. Comparisons of OUD comorbidity scores and text-based risk scores between manual review categories were carried out by one-sided Wilcoxon rank sum test with continuity correction. We used an area under the . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint Figure 2A ). Comorbidity scores between High and Some Evidence groups were also different, with the High Evidence group having higher comorbidity scores (p = 0.039). Similar patterns were observed in the text-based scores (Figure 2A ). Categories . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 12, 2021. ; https://doi.org/10.1101/2021.09.07.21263079 doi: medRxiv preprint The manual review categories for High/Some evidence for OUD were positively correlated with the comorbidity (rho = 0.49, p < 0.001) and text-based scores (rho = 0.56, p < 0.001) (Figure 3 ). Comorbidity and text-based scores were also positively correlated with each other (rho = 0.52, p < 0.001). is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 12, 2021. ; https://doi.org/10.1101/2021.09.07.21263079 doi: medRxiv preprint Opioid Use To evaluate the ability of comorbidity and text-based risk scores to detect problematic opioid use, we compared both risk scores to the manual review in the 99 individuals in the holdout test set ( Figure 4) . The text-based score achieved an AUC of 0.79, and the comorbidityscore achieved an AUC of 0.76, both indicating moderate-to-high performance. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 12, 2021. ; https://doi.org/10.1101/2021.09.07.21263079 doi: medRxiv preprint To investigate concordance between comorbidity scores, text-based scores and manual review results, we considered the individuals scoring in the top quintile of comorbidity scores, and the top quintile of text-based scores, for further follow-up manual review ( Figure 5 ). Table 3 details post-hoc manual review results. In this pilot study, we developed and tested two data-driven methods to detect OUD in EHR data that helped us characterize the continuum of problematic opioid use. This approach advances existing methods by providing additional benefits surpassing gold standard manual review. In contrast to a manual chart review, our methods increase the objectivity of EHR reviews and could be transferrable to other health care systems with access to ICD codes and clinical notes. Our primary motivation was capturing the continuum of problematic opioid use by assessing indicators of risk for, and not classification of OUD. Using these data-driven methods, we identified individuals with high scores who only had limited evidence of OUD in medical records. Notably, these patients were long-term opioid users with indications of potential problematic opioid use but lack the DSM-V signs of compulsive use characteristic of OUD. These individuals may represent a group of chronic pain patients with Complex Persistent Opioid Dependence (CPOD) [7] [8] [9] . CPOD, the gray area between opioid dependence and addiction, develops slowly, almost imperceptibly, with longterm opioid exposure [7] . By assessing problematic opioid use risk using a continuous score, these data-driven approaches may identify signs of impending problematic opioid use indiscernible to human clinicians. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 12, 2021. ; https://doi.org/10.1101/2021.09.07.21263079 doi: medRxiv preprint An additional advantage of our method is decreased reliance on human data interpretation. For example, two individuals with high data-driven scores were subsequently reassessed by manual review from Some to High Evidence for OUD. In both cases, data critical to (human) review of OUD determination were obscured in records typically not searched in manual review procedures (e.g., phone and intra-provider communications). These data-driven methods rely on agnostic processes not dependent on documented clinician concern for problematic opioid use. Therefore, our method potentially adds to, and expedites, the existing dictionary-based approaches to OUD identification within text [36] . Another major strength of our approach is scalability (ability to evaluate scores quickly over large number of records). Manual review, the gold standard by which OUD EHR detection methods are typically conducted, is extremely labor-intensive, limiting clinical and large-scale research use. Using ICD codes and clinical text, both scores can be adapted to any health system EHR, accommodating regional or system-wide contextual idioms. By using reproducible automated methods in data available in most electronic health systems, these methods have the potential to enhance generalizability. This work is not without limitations. Due to computational constraints, only text from two note types were included in the analysis; it is possible that valuable information may be captured outside the two note types we explored (e.g. clinical communications). The limited size of the CUI training set may constrain concepts identified and limit transferability across different health systems. Similarly, decisions made in scoring system development could alter their performance. For example, one TF-IDF threshold was set to exclude words found in more than 95% of patients (words found in almost all documents are unlikely to be discriminating). A threshold of 90% or 99% may perform better. Future work to explore threshold influences on scoring system performance is planned. Furthermore, our dependence on a single institution and two types of data could limit our identification of OUD. Additional factors could be used to . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 12, 2021. ; https://doi.org/10.1101/2021.09.07.21263079 doi: medRxiv preprint define a broader cohort (e.g. positive urine screens), but were out of the scope of this project due to limited data available. In addition, we attempted to develop risk scores that captured low or no risk for OUD, but our methods were unable to accurately identify controls. Identifying OUD controls in EHR data is a major limitation of the opioid research field [37] . Interesting insights from the top control CUIs included physical therapy and exercise references (Supplemental Table 4 ), which will guide future approaches to detect low risk individuals. Lastly, our approach can detect individuals with high probability of opioid misuse. Future studies should examine a broader non-chronic pain population and additional datasets to assess the base rate and dynamics of OUD in other populations, to identify individuals with milder risk, and determine whether these scores can also accurately identify negative cases. As we acquire larger and more diverse data, we see the scoring systems described here as the first step in developing a clinical decision support tool that could notify clinicians of patients at risk for OUD. We acknowledge concerns that algorithmic approaches to classifying opioid use may lead to medical discrimination [38], the risk for chronic opioid therapy patients to develop problematic opioid use is a prominent concern for clinicians. Identification of these individuals for vigilant monitoring and alternative pain management techniques may be of value in preventing transition from opioid use to OUD. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 12, 2021. ; and probable opioid misuse among recipients of chronic opioid therapy in commercial and Medicaid insurance plans: The TROUP Study. Pain. 2010;150(2):332-9. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 12, 2021. ; https://doi.org/10.1101/2021.09.07.21263079 doi: medRxiv preprint Helping to End Addiction Long-term Initiative Trends and geographic patterns in drug and synthetic opioid overdose deaths -United States As COVID-19 surges, AMA sounds alarm on nation's overdose epidemic AMA Connect Big Data-and its contributions to peri-operative medicine Classification and identification of opioid addiction in chronic pain patients Evaluation and comparison of tools for diagnosing problematic prescription opioid use among chronic pain patients. International journal of methods in psychiatric research Complex persistent opioid dependence with long-term opioids: a gray area that needs definition, better understanding, treatment guidance, and policy changes The conundrum of opioid tapering in long-term opioid therapy for chronic pain: A commentary Opioid dependence vs addiction: a distinction without a difference? Defining problematic pharmaceutical opioid use among people prescribed opioids for chronic noncancer pain: do different measures identify the same patients? The prevalence of diagnosed opioid abuse in commercial and Medicare managed care populations A model to identify patients at risk for prescription opioid abuse, dependence, and misuse Subtypes in patients with opioid misuse: A prognostic enrichment strategy using electronic health record data in hospitalized patients Using natural language processing to identify problem usage of prescription opioids Identifying opioid use disorder in the emergency department: multi-system electronic health record-based computable phenotype derivation and validation study Predicting opioid dependence from electronic health records with machine learning Automated prediction of risk for problem opioid use in a primary care setting The prevalence of problem opioid use in patients receiving chronic opioid therapy: computerassisted review of electronic health record clinical notes Assessment of probable opioid use disorder using electronic health record documentation Using machine learning to predict risk of incident opioid use disorder among fee-for-service Medicare beneficiaries: A prognostic study Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record Rates of opioid misuse, abuse, and addiction in chronic pain: a systematic review and data synthesis Phenotype risk scores identify patients with unrecognized Mendelian disease patterns Phenotype risk scores (PheRS) for pancreatic cancer using time-stamped electronic health record data: Discovery and validation in two large biobanks Phenome-wide association studies uncover a novel association of increased atrial fibrillation in male patients with systemic lupus erythematosus Evaluating statistical approaches to leverage large clinical datasets for uncovering therapeutic and adverse medication effects Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenomewide association studies in the electronic health record SpaCy models for biomedical text processing Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition Understanding inverse document frequency: on theoretical arguments for IDF Diagnostic and statistical manual of mental disorders The addiction behaviors checklist: validation of a new clinician-based measure of inappropriate opioid use in chronic pain Using natural language processing to identify problem usage of prescription opioids