key: cord-0682334-l5ccli1s authors: O’Leary, Timothy J. title: Relative Sensitivity of Saliva and Upper Airway Swabs for Initial Detection of SARS-CoV-2 in Ambulatory Patients: Rapid Review date: 2021-01-01 journal: J Mol Diagn DOI: 10.1016/j.jmoldx.2020.12.008 sha: 2534421f30c45ac96d8b1341e8fc26a97f97ea3a doc_id: 682334 cord_uid: l5ccli1s Saliva has been proposed as an alternative to upper airway swabs when testing for SARS-CoV-2. Although some studies have suggested a higher viral loads and clinical sensitivity when testing saliva, studies have been relatively small and have given rise to contradictory results. To better understand the relative performance characteristics of saliva and upper airway samples, I performed a rapid systematic review (registered on PROSPERO as CRD42020205035), focusing on studies that included at least 20 subjects who provided diagnostic saliva and upper airway samples on the same day which were tested by nucleic acid amplification methods, and for which a confusion matrix could be constructed for based on a composite reference standard. Nineteen studies comprising 21 cohorts that met predetermined acceptance criteria were identified following a search of PubMed, medRxiv and bioRxiv. Seven of these cohorts were incorporated into a meta-analysis using a random effects model, which suggests that NP swabs are somewhat more sensitive than saliva samples for the diagnosis of early disease in ambulatory patients, such as in drive-through centers or community health centers. Nevertheless, the difference is small, and the reduced need for personal protective equipment for saliva sampling may justify the difference. Conclusions are limited by the significant heterogeneity of disease prevalence in the study populations and variation in the approaches to saliva sample collection. Rapid identification of SARS-CoV-2 viral infection is important for treatment of symptomatic individuals, for disrupting transmission by asymptomatic carriers, and for understanding the dynamics of infection in communities 1 . Although the use of flocked swabs to obtain nasopharyngeal (NP) specimens for testing has constituted the "gold standard" for upper respiratory system (URS) sampling, use of specimens obtained by swabbing the oropharynx (OP), mid-turbinate (MT) and anterior nares (AN, also known as "nasal") have all served as alternatives, both as a result of supply shortages and also because these sites can be self-sampled, reducing clinician exposure and the need for personal protective equipment. Saliva, which can also be obtained without clinician assistance, has been proposed as a safe, easy and comfortable way to obtain samples for covid19 testing 2 , but published studies have been based on heterogeneous study populations and have given conflicting results [3] [4] [5] [6] [7] [8] . In this paper we set out to answer the question "When nucleic acid amplification tests (NAAT) for SARS-CoV-2 are employed for initial diagnosis, what is the relative sensitivity for detection of virus when saliva samples are used rather than nasopharyngeal, oropharyngeal, mid-turbinate or anterior nares swabs?" To answer this question, we conducted a rapid systematic review based on PRISMA principles 9 , using analysis which relies of a composite reference standard 10 based upon both swabs and saliva samples; this approach, is not biased against either sample type. The study is registered on PROSPERO (CRD42020205035), but a complete protocol has not been published. PubMed, medRxiv and bioRxiv were searched over the interval from January 1, 2020 to August 17, 2020 . Preliminary searches showed that a combination of the terms covid19 and saliva was J o u r n a l P r e -p r o o f able to capture all or nearly all relevant papers in which synonymous terms such as "SARS-CoV-2," "novel coronavirus," or "oral fluid" appeared. For this reason, a simple search string "saliva covid19 sensitivity" was used with all three databases to identify papers for further screening. Several additional papers were identified after initial peer review. Following the initial identification of papers, the titles and abstracts were screened to eliminate papers not meeting the prespecified inclusion criteria. Papers remaining after this process were rescreened, particularly since many of the papers reviewed were in the form of research letters that did not have an abstract. Ultimately, 19 papers that met inclusion criteria were available for analysis, as shown in the PRISMA flow diagram 11 (Supplemental Figure S1 ). To be included in this systematic review, studies were required to include a minimum of 20 individual subjects; each subject must have had both a saliva specimen and at least one of several URS swab specimens -nasopharyngeal (NPS), oropharyngeal (OPS), mid-turbinate (MTS) or anterior nares (AN, or "nasal") obtained on the same day. Papers that reported on tongue and cheek swabbing, without a specific effort to soak up saliva, were not included. Each of these specimens must have undergone analysis for SARS-CoV-2 sequences using either an isothermal amplification or an RT-PCR-based method. Results must have been reported in a manner that allowed construction a confusion matrix, based on a single sample-pair per patient, including the saliva-based and upper airway-based test. If a paper reported patients that were tested multiple times, and a confusion matrix could not be constructed that reflected results of the first NPS/saliva pair, that study was excluded. Studies in which "discrepant analysis" was used to resolve diagnostic conflicts between the two sites were not to be included unless data could be analyzed independently of the discrepant analysis. In the event that multiple time points were included in one of the included studies, only the first time point was to be used in our analysis. If confusion matrices could only be constructed from data involving multiple time points from the same J o u r n a l P r e -p r o o f patients, the study was excluded. No attempt was made to obtain data from the investigators involved in these published studies. When papers that were identified on bioRxiv or medRxiv were compared to those on PubMed, an additional 6 duplicate papers were eliminated. Four papers that were not identified by the search strategy were added. 1 The potential for bias associated with each study was evaluated using the QUADAS2 12,13 instrument. The risk of spectrum bias was assessed from the perspective of testing as an initial diagnostic method for ambulatory patients; the bias assessment does not constitute a judgement on the quality of the study, which may have been performed to demonstrate assay validity, assessment of recovery, or other purposes different than that for which we evaluated potential bias. Seven papers with a low risk of bias were deemed appropriate to include in a meta-analysis were analyzed using a diagnostic effects model (der Simion -Laird) 14 as implemented by OpenMetaAnalyst 15 software program. A predetermined data extraction form included study author, type of study, inclusion and exclusion criteria, setting, sample types, swab types, transport medium, manufacturer or description of nucleic acid amplification assays, as well as space to record study results in the form of confusion matrices. After initial compiling and tabulation of data, a second review was undertaken to determine whether saliva samples involved coughing or in other ways included sputum in the sample; this step was not part of the original protocol. Because the choice of any particular sample type as a "gold standard" provides a biased estimate of relative sensitivity which compared with all other sample types, a composite reference standard (CRS) 10 was computed for each study on the basis of all sample types included in the study, when possible. For J o u r n a l P r e -p r o o f one study in which results were not presented in a manner that made this possible, a CRS was computed individually for comparisons of each upper airway sample with the saliva sample. Equivocal results and assay failures were not used in the calculation of sensitivity, nor in the construction of the CRS for each study. Confidence limits for sensitivity were computed using Newcombe's efficient score method 16 as implemented in the Vassarstats Clinical Calculator 1 (http://vassarstats.net/)( Table 1) One hundred forty-nine papers were considered for inclusion; of these, 19 studies comprising 21 cohorts met inclusion criteria (Supplemental Figure S1) . A brief summary of the studies included in this review may be found in Table 1 , and a brief discussion of each paper (including the results used in this review) is presented in the Supplemental Appendix S1 [3] [4] [5] [6] [7] [8] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] . Twelve of the included cohorts involved 100 or more patients. Two of the studies separately presented data for two cohorts; one study presented data for a single cohort in which two approaches were used to collect saliva specimens 25 . The risk of spectrum bias associated with the study population, or method of recruitment, was rated as either "high" or "unclear" for 12 studies. This was the most common concern raised in the quality assessment. Studies with a high or unclear risk of bias were characterized by failure to present patient symptom status (one study), inclusion of subjects who had previously tested positive or been hospitalized for SARS-CoV-2 (eight studies), focus on healthcare workers (one study), or possibly J o u r n a l P r e -p r o o f allowing provider discretion, rather than sequential or random consent, for enrollment (two studies). Although the reviewed studies did not explicitly state that the interpretation of tests based on saliva samples was conducted without knowledge of upper airway results (or vice versa), the manner in which RTPCR testing for SARS-CoV-2 is conducted in clinical laboratories makes it highly unlikely that the specimen flow or the interpretation of RTPCR tests would suffer from significant bias. One study intentionally "built in" a bias associated with testing saliva specimens using the ID NOW assay (Abbott Scarborough), while using RTPCR for testing NP swabs 24 . Only seven studies reported limits of detection for their RTPCR assays. In one study MTS served as the sole upper airway sample, and in one study an NP/OP swab was obtained. In all the remainder a NP swab sample was evaluated. Four studies also evaluated AN swabs, and one evaluated OP swabs. The prevalence of SARS-CoV-2 infection within the cohorts, as measured using the CRS, varied from 0.02% to 95%. In 15 of the 21 cohort studied, NP swabs demonstrated a sensitivity of greater than or equal to 90% by comparison with the composite reference standard. In three of the four studies which also evaluated AN swabs, NP swabs appeared to be somewhat superior. When compared with saliva samples, anterior nasal swabs performed better in one study, equally well in a second, and less well in a third study associated with a low risk of spectrum bias, with widely overlapping confidence intervals in all cases. In two of the 21 cohorts represented in Table 1 , saliva and upper airway samples demonstrated identical sensitivity patient groups, while in 7 cohorts, saliva demonstrated greater sensitivity. In one cohort, supervised saliva collection was more sensitive than URS swab, and unsupervised saliva collection was less sensitive). 25 In the remaining cohorts, saliva samples demonstrated lower sensitivity than URS swabs in comparison with the CRS. However, most studies demonstrated wide overlap in the J o u r n a l P r e -p r o o f confidence intervals for saliva and URS specimens. The studies in which overlap was not observed included a study of hospitalized patients who were moderately to severly ill, 18 and a study of patients in a quarantine center who were tested 8 to ten days after initial confirmation of SARS-CoV-2 infection. 27 One of these studies reported saliva to be more sensitive, while the other reported NP swabs to be more sensitive. Both studies were rated as having a high risk of spectrum bias. The third study, which was thought to have alow risk of spectrum bias, found saliva to be less sensitive than NP swab 8 . Importantly, 5 of the 7 cohorts in which saliva provided greater sensitivity were thought to be associated with a high risk of spectrum bias with respect to initial diagnostic testing of a community population 21, 22, 25, 27, 29 . Of 10 cohorts that included patients who had previously tested positive for SARS-CoV2, or included hospitalized patients, four showed higher sensitivity for saliva specimens 21, 25, 27, 29 , two showed sensitivity identical to that for NP swabs 23, 28 , and four showed higher sensitivity for NP or MT swabs 17, 18, 20, 26 . A single study of asymptomatic health care workers identified four positives from saliva samples and none using NP swabs 22 . In contrast, 7 of 9 cohorts believed to have a low risk of spectrum bias with respect to diagnosis demonstrated higher sensitivity for NP swabs than for saliva specimens [3] [4] [5] [6] 8, 19 . Three studies suggested similar or greater sensitivity for saliva sampling than for nasal self-swabs, but confidence intervals are wide and one study is associated with a significant risk of spectrum bias 6, 7, 25 . Seven cohorts associated with either drive-through centers or community health centers, and believed to have a low risk of bias spectrum bias (see Supplemental Appendix S1), were included in the initial meta-analysis [3] [4] [5] [6] [7] [8] . Studies based on hospitalized patients were excluded from the meta-analysis, because they tend not to be representative of those in whom initial diagnostic testing is performed. Results of this meta-analysis are shown as forest plots in Figure 1 . It is difficult to ascertain why the Becker study 8 is an outlier. This was one of two studies in yhis review to use an RNA stabilizer with saliva (Supplemental Table S1 ). The other study to do this found a higher sensitivity for saliva samples than NP swabs when saliva sample collection was supervised, but a lower sensitivity when saliva sample collection was unsupervised 25 . The precise approach used for saliva collection was different between these two studies, but hints at a possibility that mixing of stabilizer with saliva may not have been optimal; it seems unreasonable to suppose that use of RNA stabilizer is intrinsically unfavorable to subsequent PCR analysis. There is no evidence to suggest that use of viral J o u r n a l P r e -p r o o f transport medium (VTM), phosphate buffered saline (PBS), or Tris-EDTA (TE) during saliva collection or transport significantly affects subsequent assay sensitivity. Nasopharyngeal swabs have been the "gold standard" for diagnosis of SARS-CoV-2 infection. The studies we have examined give us no reason to question that belief. Only three of the studies considered here show a diangnostic sensitivity less than 90% for NP swabs; one of these, a study of hospital workers, showed no postive results from NP swabbing! Our meta-analysis, based on cohorts undergoing initial diagnostic testing, gives an estimate of 94% (91-97%) sensitivity for NP swabs; estimating the sensitivity by simply adding the positives for seven cohorts included in the meta-analysis gives an estimate of 95% (92-98%) sensitivity for NP swabs. These estimates are similar to that obtained by repeat testing 31 , and provides a fairly high degree of confidence regarding NP swab sensitivity for initial diagnosis of SARS CoV-2 infection when assessed using the composite reference standard. The meta-analysis suggests that, when employed for initial diagnosis of SARS-CoV-2 infection in ambulatory patients, saliva samples are somewhat less likely than NP swab samples to give a positive NAAT result when compared to the CRS. Nevertheless, the performance of saliva-based testing is generally quite good, and the reduction of sensitvity may be offset by considerations of patient comfort (which might promote more frequent surveillance), as well as reduction in requirements for swabs and personal protective equipment. Three cohorts included in this review dealt with testing of asymptomatic individuals. 19, 22 Data from two of the three cohorts suggest superiority for saliva, although only one, a contact-tracing cohort, includes more than five positive subjects. Another study by Wyllie et al 2 , which did not meet the formal inclusion critria for this systematic review, arrived at the same conclusion. The low numbers of positive J o u r n a l P r e -p r o o f specimens identified in these studies limits confidence in any conclusion of superiority for saliva, but support the conclusion that saliva sampling is sufficiently sensitive for screening, as does a study that appeared after the inclusion period for this systematic review. 32 The focus of our systematic review was initial diagnosis of SARS-CoV-2 infection in ambulatory subjects, and our conclusion regarding the relative sensitivity of saliva vs NP swabs should not be assumed to apply to patients who are hospitalized or who present later in the course of disease; several of the studies included in this review suggest that saliva sampling may be more sensitive later in disease, particularly when efforts are made to assure that sputum is included in the specimen 27, 28 . None of the studies included in our meta-analysis involved attempts to collect lower-airway secretions together with saliva, and the heterogeneity of collection techniques used in studies reported here precludes any conclusion regarding the optimal method for collecting saliva. This analysis has a number of limitations. The use of a CRS, which defines the false-positive rate as zero for all assays, introduces a downward bias in the estimates of sensitivity 33 . This bias varies somewhat based on study size and precise cell frequencies, but is expected to result in a similar degree of bias (about 1-1.5%) for estimates of both saliva sample performance and upper airway sample performance; the conclusions regarding relative sensitivity are not changed. The use of an abbreviated search strategy may have missed papers that otherwise met inclusion criteria. The inclusion criteria excluded small studies (<20 subjects) which might nevertheless have provided additional information regarding relative performance. The design and the review were the product of a single individual, which may have introduced bias into both decisions regarding study design and inclusion of papers in the meta-analysis. Finally, the studies included in the review demonstrate considerable heterogeneity in assay design, J o u r n a l P r e -p r o o f patient population, and disease prevalence. Thus, this systematic review and meta-analysis may provide more insight into the range of potential results expected for labs introducing saliva-based testing than they do for the specific performance of any detailed sampling and assay strategy. In spite of these weaknesses, this systematic review has several offsetting strengths. It focuses on sample sets that have been taken at the same time, thus differing from other systematic reviews, 34-38 as well as a release from the Norwegian Institute of Public Health. (https://www.fhi.no/globalassets/dokumenterfiler/rapporter/2020/saliva-sample-for-testing-sars-cov-2infection-memo-2020.pdf, last accessed 12/14/2020) The total number of samples included in the metaanalysis is substantially greater than that of any idividual study based on simultaneous sample comparison, and allows for meta-analysis based sensitivity estimates. This likely yields a more accurate assessment of performance in community-based initial testing. Though there is no true "gold standard" for diagnosis of covid19 infection, the use of a CRS, a relatively unbiased approach to creation of a reference standard, is more likely to provide a better sense of the performance difference associated with saliva sampling vs upper airway sampling than alternative approaches based on intrinsically biased comparisons. The sensitivity difference between tests based on saliva samples and those based on NPS is both small; and, for broad community testing, justified by both patient comfort and the reduced need for personal protective equipment. Testing for Severe Acute Respiratory Syndrome-Coronavirus 2: Challenges in Getting Good Specimens, Choosing the Right Test, and Interpreting the Results Saliva or Nasopharyngeal Swab Specimens for Detection of SARS-CoV-2 Saliva-Based Molecular Testing for SARS-CoV-2 that Bypasses RNA Extraction Challenges in use of saliva for detection of SARS CoV-2 RNA in symptomatic outpatients Saliva sample as a non-invasive specimen for the diagnosis of coronavirus disease 2019: a cross-sectional study Evaluation of specimen types and saliva stabilization solutions for SARS-CoV-2 testing Self-Collected Anterior Nasal and Saliva Specimens versus Healthcare Worker-Collected Nasopharyngeal Swabs J o u r n a l P r e -p r o o f for the Molecular Detection of SARS-CoV-2 Saliva is less sensitive than nasopharyngeal swabs for COVID-19 detection in the community setting Recommendations for reporting of systematic reviews and meta-analyses of diagnostic test accuracy: A systematic review Utility of composite reference standards and latent class analysis in evaluating the clinical accuracy of diagnostic tests for pertussis Quadas-2: A revised tool for the quality assessment of diagnostic accuracy studies The revised QUADAS-2 tool Meta-analysis in clinical trials Closing the gap between methodologists and end-users: R as a computational back-end Two-sided confidence intervals for the single proportion: Comparison of seven methods EasyCOV : LAMP based rapid detection of SARS-CoV-2 in saliva Validation of a Self-administrable, Saliva-based RT-qPCR Test Detecting SARS-CoV-2. MedRxiv Comparison of saliva and oro-nasopharyngeal swab sample in the molecular diagnosis of COVID-19 Posterior Oropharyngeal Saliva for the Detection of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Comparison of SARS-CoV-2 detection in nasopharyngeal swab and saliva Saliva offers a sensitive, specific and non-invasive alternative to upper respiratory swabs for SARS-CoV-2 diagnosis Sensitivity of nasopharyngeal swabs and saliva for the detection of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Saliva as an Alternate Specimen Source for Detection of SARS-CoV-2 in Symptomatic Patients Using Cepheid Xpert Xpress SARS-CoV-2 Mass screening of asymptomatic persons for SARS-CoV-2 using saliva Self-Collected Oral Fluid and Nasal Swab Specimens Demonstrate Comparable Sensitivity to Clinician-Collected Nasopharyngeal Swab Specimens for the Detection of SARS-CoV-2 Saliva for use with a point of care assay for the rapid diagnosis of COVID-19 Does sampling saliva increase detection of SARS-CoV-2 by RT-PCR? Comparing saliva with oro-nasopharyngeal swabs Comparing nasopharyngeal swab and early morning saliva for the identification of SARS-CoV Measuring inconsistency in meta-analyses Occurrence and Timing of Subsequent SARS-CoV-2 RT-PCR Positivity Among Initially Negative Patients Real-Time RT-PCR Tests on Oral Rinses and Saliva Samples Performance of methods for meta-analysis of diagnostic test accuracy with few studies or sparse data Saliva as a non-invasive sample for the detection of SARS-CoV-2: a systematic review Saliva as a Candidate for COVID-19 Diagnostic Testing: A Meta-Analysis Rapid systematic review of the sensitivity of SARS-CoV-2 molecular testing on saliva compared to nasopharyngeal swabs The diagnostic accuracy of nucleic acid point-of-care tests for human coronavirus: A systematic review and meta-analysis The effectiveness of tests to detect the presence of SARS-CoV-2 virus, and antibodies to SARS-CoV-2, to inform COVID-19 diagnosis: a rapid systematic review *Potential for spectrum bias was evaluated in terms of the enrolled cohort. Although a group of 200 consecutively enrolled hospital patients would not be considered as suffering from selection bias, it would be viewed as having a high potential for spectrum bias (with regards to this study) since all patients were sufficiently ill as to require hospitalization. Similarly, a group of patients selected on the basis of RTPCR Ct values would be considered biased (no matter what those values were). †Data is not presented in a way that allows creation of a composite reference that includes all three specimen types. Sensitivity of saliva samples and nasal samples are each computed from separate composite references that include saliva/NP and nasal/NP respectively. J o u r n a l P r e -p r o o f ‡Although the paper did not explicitly identify these patients as symptomatic, at the time the work was done symptomatic patients were the focus of most community testing. §NP samples were tested using RTPCR, while the saliva samples were tested using ID NOW. Thus, index test bias (from the perspective of this systematic review) was intentionally built into the design of this study ¶Supervised saliva collection