key: cord-343586-28ezisog authors: Rocca, María Florencia; Zintgraff, Jonathan Cristian; Dattero, María Elena; Santos, Leonardo Silva; Ledesma, Martín; Vay, Carlos; Prieto, Mónica; Benedetti, Estefanía; Avaro, Martín; Russo, Mara; Nachtigall, Fabiane Manke; Baumeister, Elsa title: A Combined approach of MALDI-TOF Mass Spectrometry and multivariate analysis as a potential tool for the detection of SARS-CoV-2 virus in nasopharyngeal swabs date: 2020-05-07 journal: bioRxiv DOI: 10.1101/2020.05.07.082925 sha: doc_id: 343586 cord_uid: 28ezisog Coronavirus disease 2019 (COVID-19) is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The rapid, sensitive and specific diagnosis of SARS-CoV-2 by fast and unambiguous testing is widely recognized to be critical in responding to the ongoing outbreak. Since the current testing capacity of RT-PCR-based methods is being challenged due to the extraordinary demand of supplies, such as RNA extraction kits and PCR reagents worldwide, alternative and/or complementary testing assays should be developed. Here, we exploit the potential of mass spectrometry technology combined with machine learning algorithms as an alternative fast tool for SARS-CoV-2 detection from nasopharyngeal swabs samples. According to our preliminary results, mass spectrometry-based methods combined with multivariate analysis showed an interesting potential as a complementary diagnostic tool and further steps should be focused on sample preparation protocols and the improvement of the technology applied. The novel coronavirus disease 2019 , caused by the SARS-CoV-2 virus, was declared a pandemic by the World Health Organization on March 12 th 2020 following its emergence in Wuhan China. As of the April 4 th 2020 there were over 1.2M confirmed cases of COVID-19 in 175 countries, with over 65,000 fatalities (Johns Hopkins Coronavirus Resource Center, 2020). SARS-CoV-2 is one of four new pathogenic viruses which have jumped from animal to human hosts over the past 20 years, and the current pandemic sends warning signs about the need for preparedness and associated research. Clinical presentation of COVID-19 ranges from mild to severe with a high proportion of the population having no symptoms yet being equally infectious (Yang et al., 2020; Zhou et al., 2019) . Together, these features have led to intensive lockdown measures in most countries with the aim to restrict the spread of the virus, limit the burden on healthcare systems and reduce mortality rate. In parallel, there has been an extraordinary response from the scientific community. These collective efforts aim to understand the pathogenesis of the disease, to evaluate treatment strategies and to develop a vaccine at unprecedented speeds in order to minimize its impact on individuals and on the global economy (Li, 2016; Wenzhong and Hualan, Preprint) . The rapid, sensitive and specific diagnosis of SARS-CoV-2 by fast and unambiguous testing is widely recognized to be critical in responding to the outbreak. Since the current testing capacity of RT-PCR-based methods is being challenged due to the extraordinary global demand of supplies such as RNA extraction kits and PCR reagents, alternative and/or complementary testing assays need to be deployed now in an effort to accelerate our understanding of COVID-19 disease (Chin et al., 2020; Antezack et al., 2020) . The aim of this work was to assess the potential of MALDI-TOF MS technology to create mass spectra from nasopharyngeal swabs in order to find specific discriminatory peaks by using machine learning algorithms, and whether those peaks were able to differentiate COVID-19 positive samples from COVID-19 negative samples. Sample preparation and MALDI-TOF data acquisition. Samples. First, we analyzed in triplicate 25 samples of nasopharyngeal swab, preliminary tested by RT-PCR (Corman et al., 2020) All spectra were verified using the Flex Analysis v3.4 software (Bruker Daltonics, Bremen, Germany). The spectra selected for model generation and classification were treated according to a standard workflow including the following steps: baseline subtraction, normalization, recalibration, average spectra calculation, average peak list calculation, peak calculation in the individual spectra and normalization of peak list for model generation. Main Spectrum Profiles (MSPs) were performed according to manufacturer's instructions. All 25 samples were used to build an "in-house" database with Maldi Biotyper OC V3.1 (Bruker Daltonics, Bremen, Germany); in addition, dendrograms were performed to assess the relatedness of these MSPs using default settings. Spectra files from MSPs were exported as mzXML files using CompassXport CXP3.0.5. (Bruker Daltonics, Bremen, Germany) for visual analysis. At the same time, a new database was created in Bionumerics v7.6.2 (Applied Maths, Ghent, Belgium) according to the manufacturers' instructions. All raw spectra were imported into the Bionumerics database with x-axis trimming to a minimum of 1960 m/z. Baseline subtraction (with a rolling disc with a size of 50 points), noise computing (continuous wavelet transformation, CWT), smooth (Kaiser Window with a window size of 20 points and beta of 10 points), and peak detection (CWT with a minimum signal to noise ratio of 1) were performed. Spectrum summarizing, peak matching and peak assignment was performed according to instructions from Bionumerics. In short, all raw spectra were summarized into isolate spectra, and peak matching, using the option "existing peak classes only", was performed on isolate spectra using a constant tolerance of 1.9, a linear tolerance of 550 and a peak detection rate of 10%. Binary peak matching tables were exported to summarize the presence of peak classes. By assigning biomarkers, only the presence and absence of peaks was investigated. To also assess quantitative peak data such as peak intensity and peak area, a multivariate unsupervised statistical tool PCA was additionally performed. All samples were examined for the unique masses (± 10 Da). Classifier models based on machine learning. (Stephens, 1974) . The best top ten peaks are summarized in Table 1 . For evaluating the performance of the different approaches mentioned, accuracy, sensitivity, specificity, positive prediction and negative prediction were calculated (ClinPro Tools 3.0: User Manual, 2011). The results were analyzed according to the following approaches: Biomarker findings, construction of an "in-house" database and through the design of predictive models by machine learning. Manual analysis of the spectra obtained by Flex Analysis v3.4 software revealed one potential peak of negativity which was not detected in most of the positive samples: 4551 Da. When BioNumerics software was used, another potential biomarker was found in 60% of negative samples and only in 16% of positives: 3142 Da. This peak is also detected with reproducible intensity in the average spectrum of that same class in ClinProTools (Fig. 2) . Evaluation of the novel "in-house" database. The novel database was challenged with 30 previously-characterized samples different than those used to create it. They were processed in the same way as The results of Recognition capacity (RC) and Cross-Validation (CV) values of all models used are summarized in Table 2 . for the GA 2 classes and SNN 2 classes but correctly identified by the GA 3 classes (not included in the statistical analysis). Considering the SARS-CoV-2 undetectable samples: 91% (10/11) were correctly identified and only 1 sample was miss classified (Table 3) . Discussion MALDI-TOF MS is a simple, inexpensive and fast technique that analyses protein profiles with a high reliability rate, and could be used as a rapid screening method in a large population (Croxatto et al., 2012) . These preliminary results suggest that MALDI-TOF MS coupled with ClinProTools software represents an interesting alternative as a screening tool for diagnosis of SARS-CoV-2, especially because of the good performance and accuracy obtained with samples in which viral presence was not detected. More samples need to be analyzed in order to make a definitive statement. However, this study using MALDI-TOF combined with machine learning has proven to be, as far as we know, a revolutionary alternative that deserves further development. The identification of specific biomarkers responsible for each peak or group of peaks represents a difficult and demanding task that requires further specific studies. Based on the promising preliminary results, we should focus on the improvement of this potential diagnosis approach by assaying various techniques for proteins extraction to the clinical samples and on the expansion of the complementary database in the near future. These results constitute the basis for further research and we encourage researchers to explore the potential of MALDI-TOF MS in order to assess the feasibility of this technology, widely available in clinical microbiology laboratories, as a fast and inexpensive SARS-CoV-2 diagnostic tool. Rapid diagnosis of periodontitis, a feasibility study using MALDITOF mass spectrometry ClinPro Tools 3.0: User Manual Stability of SARS-CoV-2 in different environmental conditions. The Lancet Microbe Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR COVID-19 Map -Johns Hopkins Coronavirus Resource Center Applications of MALDI-TOF mass spectrometry in clinical diagnostic microbiology Structure, function, and evolution of coronavirus spike proteins EDF statistics for goodness of fit and some comparisons COVID-19: Attacks the 1-Beta Chain of Hemoglobin and Captures the Porphyrin to Inhibit Human Heme Metabolism COVID-19) : interim guidance Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a singlecentered, retrospective, observational study Coronavirus disease 2019 (COVID-19): a clinical update