key: cord-0002389-cdfn9n1i authors: Pahuta, M.; Smolders, J. M.; van Susante, J. L.; Peck, J.; Kim, P. R.; Beaule, P. E. title: Blood metal ion levels are not a useful test for adverse reactions to metal debris: A systematic review and meta-analysis date: 2016-09-09 journal: Bone Joint Res DOI: 10.1302/2046-3758.59.bjr-2016-0027.r1 sha: ff365ebbc0fc55476886b0abd129e227c1f8a527 doc_id: 2389 cord_uid: cdfn9n1i OBJECTIVES: Alarm over the reported high failure rates for metal-on-metal (MoM) hip implants as well as their potential for locally aggressive Adverse Reactions to Metal Debris (ARMDs) has prompted government agencies, internationally, to recommend the monitoring of patients with MoM hip implants. Some have advised that a blood ion level >7 µg/L indicates potential for ARMDs. We report a systematic review and meta-analysis of the performance of metal ion testing for ARMDs. METHODS: We searched MEDLINE and EMBASE to identify articles from which it was possible to reconstruct a 2 × 2 table. Two readers independently reviewed all articles and extracted data using explicit criteria. We computed a summary receiver operating curve using a Bayesian random-effects hierarchical model. RESULTS: Our literature search returned 575 unique articles; only six met inclusion criteria defined a priori. The discriminative capacity of ion tests was homogeneous across studies but that there was substantial cut-point heterogeneity. Our best estimate of the “true” area under curve (AUC) for metal ion testing is 0.615, with a 95% credible interval of 0.480 to 0.735, thus we can state that the probability that metal ion testing is actually clinically useful with an AUC ≥ 0.75 is 1.7%. CONCLUSION: Metal ion levels are not useful as a screening test for identifying high risk patients because ion testing will either lead to a large burden of false positive patients, or otherwise marginally modify the pre-test probability. With the availability of more accurate non-invasive tests, we did not find any evidence for using blood ion levels to diagnose symptomatic patients. Cite this article: M. Pahuta, J. M. Smolders, J. L. van Susante, J. Peck, P. R. Kim, P. E. Beaule. Blood metal ion levels are not a useful test for adverse reactions to metal debris: a systematic review and meta-analysis. Bone Joint Res 2016;5:379–386. DOI: 10.1302/2046-3758.59.BJR-2016-0027.R1. Despite the fact that total hip arthroplasty (tHA) has been touted as the operation of the 21st century, orthopaedic researchers continue to propose new designs in an effort to improve implant longevity and patient function. 1 one such purported improvement was the use of a large femoral head with a metal-on-metal (mom) bearing; this design failed and has become a significant public health concern leading to withdrawal and even recall of certain implant designs. 2, 3 this is yet another example of the flawed cycle of innovation in arthroplasty where new implants actually underperform relative to the existing standard. 4 this alarm over mom hip implants stems from reported high failure rates, and the potential for locally aggressive ion-induced local tissue reactions such as pseudotumours, a type of adverse reaction to metal debris (ARmD). 5 It has been estimated that over a million patients have received these implants worldwide, thus posing a significant concern in regard to monitoring and advising patients on both short and long term performance. 5 In the 1980s, mom bearings were introduced as an improvement on standard metal-on-polyethylene (mop) bearings. mom bearings generate less volumetric wear which may translate into longer implant survival. 6 In addition, mom bearings facilitate greater hip stability and range of movement from larger head sizes. Furthermore, mom bearings allow for bone-conserving hip resurfacing (HR) which is important for younger patients who will eventually require revision surgery. 7 Both mop and mom bearings have been associated with elevated systemic levels of metal ions and ARmDs, however, ion levels have been shown to be consistently higher in patients with mom bearings, and ARmDs appear to be more common in these patients. [8] [9] [10] [11] ARmDs span a spectrum of aseptic necrotic effusions that include pseudotumours as well as aseptic lymphocyte-dominated vasculitis-associated lesions (Alvl) (pericapsular hypersensitivity reactions associated with osteolysis). [12] [13] [14] When patients present with mechanical symptoms or pain following a mom hip implant, unlike with a mop hip implant, expectant management may not be advisable due to the potential for ARmD lesions which can lead to progressive tissue destruction that compromises reconstructive options. [15] [16] [17] As these lesions are diagnosed with ultrasound (US) or metal artefact reduction sequence magnetic resonance imaging (mARS mRI), there has been interest in developing an inexpensive and rapid laboratory test. 18 Government agencies worldwide have published recommendations on the surveillance and work-up of patients with mom hip implants. the United Kingdom's medicines and Healthcare Regulatory Agency (medicines and Healthcare Regulatory Agency 2012), 19 22 and the therapeutic Goods Administration of Australia (therapeutic Goods Administration 2012) 23 have recommended close followup of patients, even those with well-functioning implants. Surveillance with metal ions has been recommended for patients with the ASR implant (Depuy Synthes, Warsaw, Indiana) in the United Kingdom, with large-diameter tHA and small-diameter hip resurfacing (HR) in Australia, and with large-diameter tHA and any HR in europe. the recommended work-up of symptomatic patients by each of these organisations includes blood metal ion assessment. to date, the most detailed guidelines were put forth by the United Kingdom's medicines and Healthcare Regulatory Agency which advised that a blood metal ion level > 7 µg/l indicates potential for soft-tissue reaction. As noted by others, ion level cut-offs are arbitrary and not supported by scientific data. 5, [24] [25] [26] Given the uncertain utility of laboratory testing in surveillance and investigation of patients with mom hip implants, there is a need to synthesise the evidence for the measurement of blood cobalt and chromium ion concentrations. In this paper we report a systematic review and meta-analysis of the screening and diagnostic value of metal ion testing for ARmDs. Literature search and data extraction. We conducted an electronic search to identify relevant articles that reported original research findings including blood ion concentrations in patients with total hip arthroplasty (tHA) or hip resurfacing (HR). meDlINe (between January 1946 and 15th February 2015) and emBASe (between January 1974 and February 2015) were searched for relevant publications with the assistance of a clinical librarian. the electronic search was individually tailored to each database to maximise sensitivity (see Appendix 1). We supplemented the electronic search by obtaining referenced articles and articles citing articles were each of the articles ultimately included in the meta-analysis through Scopus. two readers (a fellowship-trained joint reconstruction surgeon and orthopaedic surgery resident) independently reviewed all articles using explicit criteria, and recorded assessments using a standard computerised form. A third reader (a fellowship-trained joint reconstruction surgeon) resolved any disagreements. Readers screened titles and abstracts to exclude animal and basic science studies, review articles, guidelines, and editorials. the readers identified articles evaluating a hip prosthesis (either HR or tHA) and reporting cobalt and/or chromium blood ion concentrations from the full text. We included articles in the meta-analysis if it was possible to reconstruct a 2 × 2 table for the use of blood ion measurements as a test for ARmDs. only studies that evaluated metal ion levels for screening or for diagnosis of symptomatic patients were eligible. Studies that only recruited patients who underwent revision surgery were excluded (three studies, 81 hips). In these papers, the decision making for revision was not clearly described. Consequently we felt that there was a high risk of bias from the spectrum effect as ion levels were likely used in the decision-making process. 27 eligible measures of diagnostic performance included: sensitivity; specificity; predictive values; likelihood ratios; diagnostic odds ratios; and receiver operating characteristic (RoC) curves. If an odds ratio for an ion cut-point used as a covariate in logistic regression was reported, we deemed the study eligible. Reviewers collected the following covariates: country of study; inclusion criteria; benchmark test; index test; number of patients; ARmD prevalence; number of revisions; and prevalence of symptomatic patients. Reviewers also assessed the quality of studies using the QUADAS-2 tool. 28 Statistical analysis. We selected the diagnostic performance meta-analytic technique according to the algorithm proposed by Chappell, Raab and Wardlaw. 29 Ultimately, we computed a summary receiver operating characteristic (SRoC) curve using a random-effects hierarchical SRoC (HSRoC) model controlling for the different cut-points reported (see Appendix 2) . 30, 31 We quantified heterogeneity by comparing the widths of 95% confidence intervals (CIs) and 95% prediction intervals (pIs). 32 implementation. We performed Bayesian computation for both the diagnostic performance and normative ion level meta-analyses using R (R Foundation, vienna, Austria) and the Bayesian modeling language, Stan. 33 We ran four chains for 5000 iterations, discarded the Flow of studies through selection process. first 2500 and used a thinning interval of five iterations. We assessed appropriate sampling of chains graphically by ensuring mixing on trace plots, and convergence by ensuring the Gelman-Rubin statistic was < 1.2. 34, 35 We used non-informative prior distributions. our literature search identified 575 unique references (Fig. 1) . Six met the selection criteria defined a priori (see Appendix 1). 9,11,36-39 We contacted the authors of two studies 9,11 reporting logistic regression with a > 5 µg/l blood ion cut-point for additional data. these six studies included a total of 898 hips, of which 376 had an ARmD. the prevalence of ARmDs ranged from 29% to 69%, and the prevalence of symptoms ranged from 23.6% to 100% (Figs 2f and 2g , Appendices 3, 4, 5, 6). Studies differed in the blood fraction tested, ion measured and cut-point used. only 50% of studies 36, 38, 39 used mARS mRI as the benchmark for diagnosis. only one study 36 used blood ion levels in a diagnostic context (symptomatic patients) whereas the remaining four studies 9,11,37,39 used blood ion levels in a screening (undifferentiated patients) context. the three studies 9,11,37 not using mARS mRI were deemed at risk of bias (table I, Fig. 2c, Appendix 3) . two studies 36,37 used plasma, rather than serum, for ion testing and were therefore deemed to have concerns regarding applicability (table I, see also Appendix 3). No study described the time interval between ion testing and imaging, and therefore all were deemed to have concerns regarding applicability (table I, see also Appendix 3). All studies were of either level I or level II quality. 40 prior to proceeding with meta-analysis, we investigated whether clinical, methodological and quality variability manifested in heterogeneity in the estimates of diagnostic accuracy obtained from each study. Study-specific estimates of specificity and sensitivity all appear to lie close to a common smooth RoC curve (Fig. 2a) , however, there appeared to be variability in the performance of specific cut-points across different studies (Fig. 2b) . this suggests that the discriminative capacity of the ion test is homogeneous across studies but that there is substantial cut-point heterogeneity. the cut-point heterogeneity may be due to heterogeneity in the benchmark modality used: the studies reporting unexpectedly high specificity and low sensitivity at 5 µg/l and 7 µg/l cutpoints did not exclusively use mARS mRI as the benchmark (Fig. 2c) . Sample size, ARmD prevalence, prevalence of symptomatic patients, and ion test characteristics were not associated with cut-point heterogeneity (Figs 2d, 2e, 2f and 2g). Given the homogeneity in discrimination capacity but implicit cut-point heterogeneity we pursued SRoC meta-analysis without meta-analysing cut-points (see Appendix 2) . our best estimate of the "true" RoC curve for metal ion test is the mean SRoC curve plotted in Figure 3 . However, due to random variability the "truth" may not be the same in all studies. Accounting for this random variability, we have 95% confidence that the study-specific "truth" will lie within the 95% prediction region. the prediction and credible regions have similar widths, which further supported minimal heterogeneity. the area under the curve (AUC) for the SRoC curve was 0.615 (95% CI 0.480 to 0.735), thus we can state that the probability that metal ion testing is actually clinically useful with AUC ≥ 0.75 is 1.7% (see Appendix 2) . 41 Due to implicit cut-point heterogeneity, we did not perform meta-analysis of cut-point performance (see Appendix 2) . therefore, the SRoC curve in Figure 3 does not relate cut-points to a particular specificity and sensitivity. However, diagnostic performance at any given cutpoint will lie somewhere on the SRoC curve in Figure 3 -we just do not know where. Hence, our meta-analysis can be used to evaluate the overall performance of ion tests, without reference to a particular cut point. this systematic review and meta-analysis is the first synthesis of evidence for the use of blood ion measurements as a test for ARmDs in patients with mom hip implants. We identified minimal heterogeneity in the inherent discrimination capacity of ion tests used in each study. our meta-analysis indicates that blood ion levels are a poor test for classifying patients as having or not having an ARmD. All but one study included in our review evaluated blood ion levels in a screening context. estimates of diagnostic accuracy obtained from high prevalence/symptomatic samples can be biased upwards due to the spectrum effect. 22 the prevalence of ARmDs and symptomatic patients in the included studies spanned a wide range (29% to 69%), therefore, we could graphically evaluate for a spectrum effect. Since Figures 2f and 2g demonstrated that symptom prevalence was not associated with the operating point, we concluded that there was an absence of spectrum effect in our meta-analysis. Based on a mean AUC of 0.615, blood ion levels are a poor, and not clinically useful, test for classifying patients as having or not having an ARmD. [41] [42] [43] [44] It has been suggested that a clinically useful test has an AUC ≥ 0.75. 45 Considering the reconstructive consequences of delayed diagnosis, a false negative result could harm patients. [15] [16] [17] With the availability of non-invasive tests which definitively determine the presence of an ARmD, we see no role for using blood ion levels to diagnose symptomatic patients. 18 Screening is the process of identifying high-risk patients in the general population. Since screen-positive patients will undergo further testing, screening tests need not be as accurate as diagnostic tests. Screening can use two different approaches: exclude patients with very low probability of disease from further testing by maximising the negative predictive value (Npv), or identify highrisk patients for further testing by maximising the positive predictive value (ppv). 46 the performance of ion testing using these two approaches is shown in table II. Calculations were made using the SRoC curve plotted in Figure 3 and using the mean prevalence of ARmDs in the studies included in this review (41%). Indeed, maximising Npv is burdensome because 99% of patients will test positive. Furthermore, test-positive patients have the same probability of disease as they did prior to undergoing the test. on the other hand, maximising ppv does not reassure test negative patients because they still have a 21% probability of having an ARmD. test-positive patients are hardly "high risk" because the risk of an ARmD is marginally different from the pre-test probability (52% versus 41%). Aside from statistical concerns, screening for ARmDs is problematic on theoretical grounds. the World Health organization recommends that screening only be performed if patients will be offered treatment. 47 We are unaware of any evidence supporting revision on asymptomatic patients with ARmDs and thus screening would serve no clinical purpose. We have synthesised the totality of evidence for the diagnostic value of metal ion levels for ARmDs in patients with mom hip implants. We conclude that blood ion levels have no role in the diagnostic algorithm for ARmDs. the probability that we have incorrectly calculated the AUC to be less than 0.75 is 1.7%. Given the strength and consistency of the findings of our meta-analysis, and the improbability that the results of our meta-analysis are incorrect, further study of metal ion testing for the diagnosis of ARmDs would be an inefficient use of research resources. A perceived limitation of our study may be that conclusions are based on a small number of studies, half of which did not use mARS mRI as the benchmark modality. We therefore carefully assessed for, and controlled for, heterogeneity. We used a powerful meta-analytic technique that allowed us to partition results into a "cut-point effect" and "accuracy effect" (see Appendix 2) . the methodological heterogeneity only manifested in heterogeneity in the cut-point effect, and not in the accuracy effect. Due to heterogeneity, our meta-analysis cannot be used to determine a useful cut-point. However, this is a moot point because the accuracy of the test is so poor. It was remarkable that these methodologically heterogeneous studies formed a smooth RoC curve (Fig. 2) . therefore, there was substantial homogeneity among these studies in the accuracy effect. this homogeneity is further reflected in the fact that the prediction intervals and confidence intervals were nearly equivalent in width (Fig. 3) . In other words, our results are tantamount to those from a single study with 898 hips and 376 ARmDs. We emphasise that our systematic review evaluated the use of blood metal ion levels for the diagnosis of ARmDs. our findings do not apply to the investigation of the systemic consequences of metal ion exposure which are believed to occur at levels > 60 µg/l. 48 Further research should be directed to determining how blood ion measurements should be used to investigate cobaltism. 49 We conclude that the available evidence does not support existing guidelines, which recommend the use of blood ion measurements for both screening and diagnosis of ARmD. Appendices showing the medline search strategy, RoC curves, study and patient characteristics, pseudotunour detection and concentration results can be found alongside the paper online at http://www.bjr. boneandjoint.org.uk/ Summary receiver operating characteristic curve for meta-analysis. mean curve (-), 95% credible region (---), and 95% prediction region (…) are shown. Sensitivity and specificity reported by individual studies (•). The operation of the century: total hip replacement Out of joint: the story of the ASR How safe are metal-on-metal hip implants? Failed innovation in total hip replacement. Diagnosis and proposals for a cure Risk stratification algorithm for management of patients with metal-on-metal hip arthroplasty: consensus statement of the American Association of Hip and Knee Surgeons, the American Academy of Orthopaedic Surgeons, and the Hip Society Analysis of 118 second-generation metal-on-metal retrieved hip implants Metal-on-Metal Bearings in Total Hip Arthroplasty Metal-on-metal or metal-on-polyethylene for total hip arthroplasty: a meta-analysis of prospective randomized studies High incidence of pseudotumour formation after large-diameter metal-on-metal total hip replacement: A prospective cohort study Inflammatory pseudotumor complicating metal-on-highly cross-linked polyethylene total hip arthroplasty High prevalence of pseudotumors in patients with a Birmingham Hip Resurfacing prosthesis: a prospective cohort study of one hundred and twenty-nine patients Metal-on-metal bearings and hypersensitivity in patients with artificial hip joints. A clinical and histomorphological study Pseudotumours associated with metal-on-metal hip resurfacings Early failure of metal-on-metal bearings in hip resurfacing and large-diameter total hip replacement: A consequence of excess wear Hip resurfacings revised for inflammatory pseudotumour have a poor outcome Poor outcome of revised resurfacing hip arthroplasty Surgical technique: transfer of the anterior portion of the gluteus maximus muscle for abductor deficiency of the hip The John Charnley Award: diagnostic accuracy of MRI versus ultrasound for detecting pseudotumors in asymptomatic metal-on-metal THA Medical safety alert: Metal-on-metal (MoM) hip replacements -updated advice with patient follow ups Concerns about Metal-on-Metal Hip Implants Final opinion on the safety of Metal-on-Metal joint replacements with a particular focus on hip implants Metal-on-Metal Hip Implants -Information for Orthopaedic Surgeons Regarding Patient Management Following Surgery -For the Public Metal-on-metal hip replacement implants The Hip Society: algorithmic approach to diagnosis and management of metal-on-metal arthroplasty No authors listed. Food and Drug Agency. FDA Safety Communication: Metal-on-Metal Hip Implants Otto Aufranc Award: the interpretation of metal ion levels in unilateral and bilateral hip resurfacing Statistical Methods in Diagnostic Medicine QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies When are summary ROC curves appropriate for diagnostic meta-analyses A hierarchical regression approach to meta-analysis of diagnostic test accuracy evaluations Meta-analysis of diagnostic test accuracy assessment studies with varying number of thresholds Quantifying heterogeneity in a meta-analysis Stan Modeling Language Users Guide and Reference Manual Markov Chain Monte Carlo Convergence Diagnostics : A Comparative Review General methods for monitoring convergence of iterative simulations The sensitivity, specificity and predictive values of raised plasma metal ion levels in the diagnosis of adverse reaction to metal debris in symptomatic patients with a metal-on-metal arthroplasty of the hip Relationship of plasma metal ions and clinical and imaging findings in patients with ASR XL metal-on-metal total hip replacements Metal ion levels not sufficient as a screening measure for adverse reactions in metal-on-metal hip arthroplasties Treatment of pseudotumors after metal-on-metal hip resurfacing based on magnetic resonance imaging, metal ion levels and symptoms A practical guide to assigning levels of evidence Understanding receiver operating characteristic (ROC) curves Receiver operating characteristic (ROC) methodology: the state of the art The meaning and use of the area under a receiver operating characteristic (ROC) curve Can routine laboratory tests discriminate between severe acute respiratory syndrome and other causes of community-acquired pneumonia? Understanding receiver operating characteristic (ROC) curves Screening: Evidence and practice Arthroprosthetic cobaltism associated with metal on metal hip implants Clinical features, testing, and management of patients with suspected prosthetic hip-associated cobalt toxicity: a systematic review of cases