key: cord-308527-scxemthv authors: Barauna, V. G.; Singh, M. N.; Barbosa, L. L.; Marcarini, W. D.; Ribeiro-Rodrigues, R.; Vassallo, P. F.; Mill, J. G.; Warnke, P. H.; Martin, F. L. title: Ultra-rapid on-site detection of SARS-CoV-2 infection using simple ATR-FTIR spectroscopy and analysis algorithm: high sensitivity and specificity date: 2020-11-04 journal: nan DOI: 10.1101/2020.11.02.20223560 sha: doc_id: 308527 cord_uid: scxemthv There is an urgent need for ultra-rapid testing regimens to detect the SARS-CoV-2 [Severe Acute Respiratory Syndrome Coronavirus 2] virus infections in real-time within seconds to stop its spread. Current testing approaches for this RNA virus focus primarily on diagnosis by RT-qPCR, which is time-consuming, costly, often inaccurate and impractical for general population rollout due to the need for laboratory processing. The latency until the test result arrives with the patient has led to further virus spread. Furthermore, latest antigen rapid tests still require 15 to 30 min processing time and are challenging to handle. Despite increased PCR-test and antigen-test efforts the pandemic has entered the worldwide second stage. Herein, we applied a superfast reagent-free and non-destructive approach of attenuated total reflection Fourier-transform infrared (ATR-FTIR) spectroscopy with subsequent chemometric analysis to the interrogation of virus-infected samples. Contrived samples with inactivated gamma-irradiated Covid-19 virus particles at levels down to 1582 copies/ml generated infrared (IR) spectra with good signal-to-noise ratio. Predominant virus spectral peaks are associated with nucleic acid bands, including RNA. At low copy numbers, the presence of virus particle was found to be capable of modifying the IR spectral signature of saliva, again with discriminating wavenumbers primarily associated with RNA. Discrimination was also achievable following ATR-FTIR spectral analysis of swabs immersed in saliva variously spiked with virus. Following on, we nested our test system in a clinical setting wherein participants were recruited to provide demographic details, symptoms, parallel RT-qPCR testing and the acquisition of pharyngeal swabs for ATR-FTIR spectral analysis. Initial categorisation of swab samples into negative versus positive Covid-19 infection was based on symptoms and PCR results. Following training and validation of a genetic algorithm-linear discriminant analysis (GA-LDA) algorithm, a blind sensitivity of 95% and specificity of 89% was achieved. This prompt approach generates results within two minutes and is applicable in areas with increased people traffic that require sudden test results such as airports, events or gate controls. In early 2020, a new strain of coronavirus called SARS-CoV-2 (Severe Acute Respiratory Syndrome Coronavirus 2), more commonly known as causing the Covid-19 disease, gave rise to a global pandemic (1) . Starting from an epidemic outbreak in Wuhan (China), the virus quickly spread westwards towards Europe and the USA (2) with serious health and socio-economic consequences worldwide (3) . SARS-CoV-2 exhibits a high propensity for infectious spread throughout populations (2) . Every Covid-19 positive case, if not contained, can readily spread to two or more people giving a virulent R number (4) . Some countries, such as South Korea, initially successfully fought the Covid-19 outbreak. This is based on the key aspects (5) of: (a) prevention, via good cleaning practices and isolation of potential cases; (b) testing, to identify those infected and to precisely isolate risk cases; and, (c) anti-viral treatment and, in the future, a vaccine. Testing is fundamental to identify infected people and regions of risk (6) . This can enable intelligent isolation of areas without affecting an entire country's economy and allow allocation of resources to more strategically fight the disease, with more ventilators, medication and medical staff assigned to regions with more diagnosed cases. The main challenges for testing are the cost and in particular time. Goldstandard diagnosis by RT-qPCR is costly with a shortage of testing facilities even in developed countries and can take >2 days to get the result, because specimen have to be transported for processing to often distant laboratories (7) . This is not suitable for mass testing (8) . Despite globally increased PCR-test efforts the pandemic was not brought to a halt. In contrast, there is a recurrence and second wave of the disease, because many infectious patients spread the disease while waiting on their PCR-test results. There are some companies are developing quicker and lower-cost tests based . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 4, 2020. ; https://doi.org/10.1101/2020.11.02.20223560 doi: medRxiv preprint on novel sensors (9) . Alternative antigen-or antibody-detection approaches remain affected by low specificity (that is, healthy patients could be wrongly classified as Covid-19 positive) thus creating statistical bias that could directly affect public health policies (10) . Thus, there is a need to develop COVID-19 test approaches that can deliver results in real time and on-site. Vibrational spectroscopy, including attenuated total reflection Fouriertransform infrared (ATR-FTIR) spectroscopy, has been widely used to discriminate and classify normal and pathological populations using different cell types, tissues or biofluids (11) (12) (13) . Readily accessible biofluids, such as blood plasma/serum, saliva or urine, are considered ideal for clinical implementation due to routine methods of collection, as well as minimal sample preparation (14) . Interrogation of samples with infrared (IR) spectroscopic techniques allows for the generation of a "spectral fingerprint" which subsequently facilitates the discrimination of the different populations and identification of potential biomarkers (15) . In the past few years, biofluid-based ATR-FTIR spectroscopy have been used for diagnosing, screening or monitoring the progression/regression in a variety of diseases (16) . Spectroscopic techniques are rapid, cost-effective and non-destructive which make them a perfect candidate for translation to clinic. As a readily accessible non-invasive biofluid, saliva is an ideal candidate to facilitate disease detection; indeed, oral health has long been known to be an indicator of whole organism health (17) . Herein, ATR-FTIR spectroscopy was used to interrogate saliva samples on pharyngeal swabs taken from individuals with or without suspected infection with Covid-19. Unlike many tests developed using laboratory-based contrived specimens, we trialled the approach in clinical settings on real-world samples. Our goal was to differentiate individuals with active infection . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 4, 2020. ; https://doi.org/10.1101/2020.11.02.20223560 doi: medRxiv preprint based on a series of spectral biomarkers. We also took into consideration symptoms and other demographic features of our participants as confounding factors. We propose a new, ultra-rapid on-site method to detect Covid-19 based on pharyngeal swabs using IR light, with potential for ready implementation in general population settings. Figure 1a shows a typical spectrum of inactivated gamma-irradiated Covid-19 virus particles (e SARS.CoV2/SP02.2020.HIAE.Br GenBank accession number MT 126808.1) (18); at 1582 copies/ml, an ATR-FTIR spectrum with good signal-to-noise ratio (SNR) is obtained. This was in order to assess the limit of detection (LoD) for biospectroscopy to ascertain the minimum concentration at which the virus could be detected by IR spectroscopy. Below this level, the SNR becomes poor and noisy. This clearly points to the ability of ATR-FTIR spectroscopy to extract a unique viral fingerprint consistent of spectral features associated with a pure virus spectrum. It is interesting to note that the predominant spectral peaks are associated with nucleic acid bands, including RNA. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 4, 2020. ; https://doi.org/10.1101/2020.11.02.20223560 doi: medRxiv preprint control saliva in comparison with saliva spiked inactivated virus particle at various copy number levels highlighted an ability to detect virus particle-induced spectral alterations at levels that would be considered extremely low in the pharyngeal cavity of infected humans (symptomatic or asymptomatic). Even more compellingly, when this is examined using basic multivariate analysis (i.e., PCA), the IR spectral signature of pure inactivated virus segregates away from control saliva in a scores plot ( Figure 2c ). When saliva is spiked with exceptionally low levels of virus (781 copies/ml; a1 cluster below), the spectral points co-cluster with control saliva spectral points suggesting no differences. However, at a level of 12,500 copies/ml (a2 cluster below), there is segregation away from the control. It is critical to note that the loadings plot specifically identifies RNA as being proportional to virus levels ( Figure 2b ). The loadings on PC1 show the bands responsible for increase of virus concentration (nucleic acid bands), and the loadings on PC2 shows the bands responsible for discrimination between saliva and virus (Amide I and Amide II bands present in saliva but not virus). Other, primarily protein-associated bands discriminate the saliva from the virus -we believe this to be the first report of its kind using biospectroscopy. Furthermore, in the complex milieu of a saliva sample, which will undoubtedly contain a range of complex constituents including aqueous, exfoliated cellular material, post-infection immunoglobulins such as IgA and other individual or contaminating factors, a multivariate chemometric approach can still extract the viralassociated discriminating features. Following this, Figure 3 shows the analysis of swabs spiked with either saliva with or without spiking with gamma-irradiated Covid-19 virus particles. Figures 3a and 3b show spectra with good SNR. In consequent PCA scores plots, the spectral data points for virus-spiked saliva swabs segregate . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 4, 2020. ; https://doi.org/10.1101/2020.11.02.20223560 doi: medRxiv preprint away from swab or control saliva swab categories ( Figure 3c ). This is achieved at low copy numbers. The loadings on PC1 show the bands responsible for separation between swab + saliva and swab + saliva + virus (Amide I and Amide II band of proteins) and the loadings on PC2 show the bands responsible for variation of virus concentration (Amide I, Amide II and nucleic acids bands) (Figure 3d ). Differently from saliva, the swab sample contains bands on the nucleic acids region plus Amide I and Amide II that may come from the saliva itself. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 4, 2020. ; https://doi.org/10.1101/2020.11.02.20223560 doi: medRxiv preprint Consequently, five GA-LDA selected variables were identified, each which significantly (P <0.01) discriminates negative and Covid-19 positive swab samples ( Figure 6 , Table 2 ). Using saliva swab-based vibrational spectroscopy we achieved results with significant clinical relevance. ATR-FTIR spectroscopy has been proven to be capable of distinguishing between patient and healthy groups negative and Covid-19 positive swab samples. The plausible mechanistic basis for this is that the prominent distinguishing features extracted are primarily associated with nucleic acids, RNA in particular. This study was carried out in agreement with the Helsinki declaration and authorized by the Hospitals Directive, due to the emergency situation. Ethical approval for the investigation was granted by the Ethics Committee Federal University of Espírito Santo (#0993920.1.0000.5071 and #31411420.9.0000.8207). Full ethical approval was given to undertake the studies described herein. All procedures and possible risks were explained to participants before they provided written consent. Pharyngeal cotton swabs (FirstLab, Brazil) were from individuals >18 y, who came to one of the six hospitals participating in the study and met the criteria for suspected cases according to the State Health Secretary and World Health Organization (WHO) guidelines between June and September 2020. For all participants, demographic data (age, gender, pre-existing medical conditions, symptoms, date of symptoms' onset) were collected. Exclusion criteria were those with inconclusive RT-qPCR results after two rounds of RT-qPCR. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 4, 2020. ; https://doi.org/10.1101/2020.11.02.20223560 doi: medRxiv preprint For the gold standard protocol via diagnosis by RT-PCR, a nasopharyngeal swab was collected from participants by inserting a rayon swab with a plastic shaft into the nostril parallel to the palate. The swab was inserted to a location equidistant from the nostril and the outer opening of the ear and was gently scraped for a few seconds to absorb secretions. The swab was then placed immediately into a sterile tube containing viral transport medium. RT-PCR was performed in the central laboratory from the Health Secretary of Espírito Santo (LACEN-SESA) to allow definitive diagnosis of COVID-19 infection. For ATR-FTIR spectroscopy, a pharyngeal swab was collected from participants by inserting a cotton swab into the mouth and scrapping the tonsils, the tongue and the inner part of the cheek. The swab was then placed immediately into a sterile tube and stored on ice until analysis. Samples were taken simultaneously as nasopharyngeal swabs for PCR testing. In the clinical setting, all PCRs were locally or nationally approved tests. All samples were analysed at the same state-approved laboratory. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 4, 2020. ; https://doi.org/10.1101/2020.11.02.20223560 doi: medRxiv preprint employs primers and probes for the N1, N2, and RP genes. The second was Maccura (designed by Maccura Biotechnology Co., Hi-tech Zone, Chengdu, China), which is a single-well triple target assay and identifies three genes from SARS-CoV-2 (E, N, and ORF1ab) and provides a separate positive internal control (IC). The third was the Molecular SARS-CoV-2 (E/RP genes) kit (Instituto de Tecnologia is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 4, 2020. ; https://doi.org/10.1101/2020.11.02.20223560 doi: medRxiv preprint spectroscopic measurement, three spectra were obtained from each saliva swab. Each swab analysis was performed with 32 co-additions, interspersed with 32 background scans. After each analysis, the swab was removed from the crystal and the crystal was cleaned with miliQ water and 70% alcohol, thus avoiding inter-sample contamination. For spiking experiments, gamma-irradiated inactivated Covid-19 virus particles (from a stock solution of 1 × 10 5 copies/ml deionised water) were mixed in various copy number concentrations in saliva taken from a 42-y-old male classed as negative for infection. The following protocols were undertaken: 1-Four µl of gamma-radiation inactivated Covid-19 virus solution was applied to the ATR diamond and let dry for 4-5 min. Then serial dilutions of the virus in deionized water were analysed in a similar fashion. 2-A series of serial dilutions in saliva from a negative study participant were generated and applied to the ATR diamond and let dry for 4-5 min. 3-Fifteen µl of saliva spiked with gamma-radiation inactivated virus (step 2) were added to a cotton swab. The saliva cotton swab was then applied straight to the ATR diamond and immediately analysed. Pre-processing and data analysis were carried out using MATLAB 2014b (The Math Works, MA, USA). The spectra were pre-processed by truncating the fingerprint region (1,800-900 cm -1 ), followed by Savitzky-Golay smoothing (9 point window, 2 nd order polynomial fitting), automatic weighted least squares baseline correction and vector normalisation. Towards exploratory data analyses, following pre-processing of raw spectra, spectral data were mean-centred and evaluated by means of principal component analysis (PCA) (21) . PCA is an unsupervised technique that reduces the is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 4, 2020. ; https://doi.org/10.1101/2020.11.02.20223560 doi: medRxiv preprint spectral data space to principal components (PCs) responsible for the majority of variance in the original dataset. Each PC is orthogonal to each other, where the first PC accounts to the maximum explained variance followed by the second PC and so on. The PCs are composed of scores and loadings, where the first represents the variance on sample direction, thus being used to assess similarities/dissimilarities among the samples; and the latter represents the contribution of each variable for the model decomposition, thus being used to find important spectral markers. This technique looks for inherent similarities/differences and provides a scores matrix representing the overall "identity" of each sample; a loadings matrix representing the spectral profile in each PC; and a residual matrix containing the unexplained data. Scores information can be used for exploratory analysis providing possible classification between data classes. PCA was the method of choice for analysing saliva samples spiked with inactivated virus particle. It is simple, fast, and combines exploratory analysis, data reduction, and feature extraction into one single method. PCA scores were used to explore overall dataset variance and any clustering related to limit of detection, while the loadings on the first two PCs were used to derive specific biomarkers indicative of infection category. Genetic algorithm (GA) is a variable selection technique used to reduce the spectral data space into a few variables and works by simulating the data throughout an evolutionary process (22, 23) . The original space is maintained for both algorithms, and no transformation is made as in PCA. Therefore, the selected variables have the same meaning of the original ones (i.e., wavenumbers), and they is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 4, 2020. where g ! is defined as where the numerator is the squared Mahalanobis distance between the object x ! and the sample mean m ! ! of its true class; and the denominator is the squared Mahalanobis distance between the object x ! and the mean of the closest wrong class (20) . The GA calculations were performed during 100 generations with 200 chromosomes each. One-point crossover and mutation probabilities were set to 60% and 10% respectively. GA is a non-deterministic algorithm, which can give different results by running the same equation/model. Therefore, the algorithm was repeated . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 4, 2020. ; https://doi.org/10.1101/2020.11.02.20223560 doi: medRxiv preprint three times, starting from random initial populations, with the best solution resulting from the three realizations of GA employed. Sensitivity (probability that a test result will be positive when disease is present) and specificity (probability that a test result will be negative when disease is not present) were given by the following equations: where TP is defined as true positive; FN as false negative; TN as true negative; and FP as false positive. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 4, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 4, 2020. ; https://doi.org/10.1101/2020.11.02.20223560 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 4, 2020. ; https://doi.org/10.1101/2020.11.02.20223560 doi: medRxiv preprint Mix between saliva and saliva + virus for low concentration (≤781 copies/mL); c2.: Mix between pure virus and saliva + virus for high concentration (≥1.25×10 4 copies/mL). Pre-processing: Savitzky-Golay (SG) smoothing (7 point window, 2 nd order polynomial fitting) and baseline correction. The loadings on PC1 show the bands responsible for increase of virus concentration (nucleic acid bands), and the loadings on PC2 shows the bands responsible for discrimination between saliva and virus (Amide I and Amide II bands present in saliva but not virus). . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 4, 2020. ; https://doi.org/10.1101/2020.11.02.20223560 doi: medRxiv preprint Figure 3 . (a) Average raw spectra and (b) pre-processed spectra for Swab + Saliva (n = 5), and Swab + Saliva + Virus (n = 54, 1×10 5 -98 copies/mL). (c) PCA scores and (d) PCA loadings on PC1 vs. PC2 for the pre-processed data. Inset c1.: virus concentration around 6.25×10 3 copies/mL. c2.: virus concentration around 1.56×10 3 copies/mL; c3.: virus concentration ≤ 781 copies/mL. Pre-processing: Savitzky-Golay (SG) smoothing (7 point window, 2 nd order polynomial fitting) and baseline correction. The loadings on PC1 show the bands responsible for separation between swab + saliva and swab + saliva + virus (Amide I and Amide II band of proteins) and the loadings on PC2 show the bands responsible for variation of virus concentration (Amide I, Amide II and nucleic acids bands). Differently from Saliva, the Swab sample contains bands on the nucleic acids region plus Amide I and Amide II that may come from the saliva itself. c. d. . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 4, 2020. ; https://doi.org/10.1101/2020.11.02.20223560 doi: medRxiv preprint c. d. . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 4, 2020. ; https://doi.org/10.1101/2020.11.02.20223560 doi: medRxiv preprint c. d. . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 4, 2020. ; https://doi.org/10.1101/2020.11.02.20223560 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 4, 2020. ; https://doi.org/10.1101/2020.11.02.20223560 doi: medRxiv preprint Tables Table 1 . Confusion matrix and figures of merit for the validation set using GA-LDA algorithm pre-processing. Predicted Negative Predicted Positive Negative 54 7 Positive 1 19 Parameters Accuracy 90% Sensitivity 95% Specificity 89% F-Score 92% *Pre-processing: Savitzky-Golay smoothing (9 point window, 2 nd order polynomial fitting), automatic weighted least squares baseline correction and vector normalisation. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 4, 2020. ; https://doi.org/10.1101/2020.11.02.20223560 doi: medRxiv preprint Coronavirus infections and immune responses Harnessing the immune system to overcome cytokine storm and reduce viral load in COVID-19: a review of the phases of illness and therapeutic agents COVID-19: A Make or Break Moment for Global Policy Making Estimating the impact of mobility patterns on COVID-19 infection rates in 11 European countries Implementation Science to Respond to the COVID-19 Pandemic Considerations for diagnostic COVID-19 tests Empowering academic labs and scientists to test for COVID-19 Detecting the Coronavirus (COVID-19) Halvorsen K Programmable low-cost DNA-based platform for viral RNA detection Lessons from COVID-19 on the role of the state and the market in providing early testing Fourier Transform Infrared (FTIR) Spectroscopy of Biological Tissues Using Fourier transform IR spectroscopy to analyze biological materials Vibrational spectroscopy of biofluids for disease screening or diagnosis: translation from the laboratory to a clinical setting Spectrochemical analysis of liquid biopsy harnessed to multivariate analysis towards breast cancer screening Distinguishing cell types or populations based on the computational analysis of their infrared spectra Attenuated total reflection Fourier-transform infrared spectral discrimination in human bodily fluids of oesophageal transformation to adenocarcinoma Diagnostic Biomarkers for Alzheimer's Disease Using Non-Invasive Specimens SARS-CoV-2 isolation from the first reported patients in Brazil and establishment of a coordinated task network Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR Classification of cervical cytology for human papilloma virus (HPV) infection using biospectroscopy and variable selection techniques Principal component analysis Differential diagnosis of Alzheimer's disease using spectrochemical analysis of blood Segregation of ovarian cancer stage exploiting spectral biomarkers derived from blood plasma or serum analysis: ATR-FTIR spectroscopy coupled with variable selection methods