key: cord-0003154-izs6g3kn authors: Purcaro, Giorgia; Rees, Christiaan A; Wieland-Alter, Wendy F; Schneider, Mark J; Wang, Xi; Stefanuto, Pierre-Hugues; Wright, Peter F; Enelow, Richard I; Hill, Jane E title: Volatile fingerprinting of human respiratory viruses from cell culture date: 2018-03-01 journal: J Breath Res DOI: 10.1088/1752-7163/aa9eef sha: 186fa1c35cc687d9c59b234f8327a881ec960658 doc_id: 3154 cord_uid: izs6g3kn Volatile metabolites are currently under investigation as potential biomarkers for the detection and identification of pathogenic microorganisms, including bacteria, fungi, and viruses. Unlike bacteria and fungi, which produce distinct volatile metabolic signatures associated with innate differences in both primary and secondary metabolic processes, viruses are wholly reliant on the metabolic machinery of infected cells for replication and propagation. In the present study, the ability of volatile metabolites to discriminate between respiratory cells infected and uninfected with virus, in vitro, was investigated. Two important respiratory viruses, namely respiratory syncytial virus (RSV) and influenza A virus (IAV), were evaluated. Data were analyzed using three different machine learning algorithms (random forest (RF), linear support vector machines (linear SVM), and partial least squares-discriminant analysis (PLS-DA)), with volatile metabolites identified from a training set used to predict sample classifications in a validation set. The discriminatory performances of RF, linear SVM, and PLS-DA were comparable for the comparison of IAV-infected versus uninfected cells, with area under the receiver operating characteristic curves (AUROCs) between 0.78 and 0.82, while RF and linear SVM demonstrated superior performance in the classification of RSV-infected versus uninfected cells (AUROCs between 0.80 and 0.84) relative to PLS-DA (0.61). A subset of discriminatory features were assigned putative compound identifications, with an overabundance of hydrocarbons observed in both RSV- and IAV-infected cell cultures relative to uninfected controls. This finding is consistent with increased oxidative stress, a process associated with viral infection of respiratory cells. Infections of the lower respiratory tract, including both influenza and pneumonia, are among the top 10 leading causes of death in the United States [1] , and pneumonia remains one of the world's leading causes of death for children under the age of five [2] . According to the Centers for Disease Control and Prevention (CDC), approximately 30% of acute respiratory infections of viral etiology in the United States (roughly 47 million cases annually) are inappropriately treated with antimicrobial therapies that are not effective against viral pathogens [3] [4] [5] . Furthermore, it is estimated that a causative pathogen is identified in only approximately 40% of pneumonia cases overall, and a subset of these cases for which a pathogen could not be identified are likely of viral etiology [6] . A diagnostic capable of rapidly distinguishing between infections of viral, bacterial, or fungal etiology could inform the clinical management of individuals with respiratory infections, potentially reducing the inappropriate use of antibiotics for viral infections [7, 8] . Limitations of currently-available diagnostic tools for the detection of lower respiratory infections are mainly related to the difficulty of obtaining an adequate sputum sample (e.g., sputum is not produced by most children) and in differentiating between infection and colonization in the setting of a positive result [9] . Specifically, one must be careful when interpreting the results obtained from tests that specifically target organisms such as Staphylococcus aureus, Streptococcus pneumoniae, Haemophilus influenzae, or certain fungi (i.e., Candida), as up to 20% of healthy individuals can be asymptomatically colonized [10] . Several rapid, multiplex diagnostic tests for organism detection are commercially available [8, 11, 12] , but their role at present is limited, since, in addition to the previouslymentioned shortcomings, they lack proper evaluation of their selectivity and specificity, mainly due to the absence of an indisputable gold standard techniques for the identification of many pathogens [8, [10] [11] [12] [13] . To-date, most assays for the detection of respiratory viruses have focused on the identification of either virally-derived nucleic acids (e.g., multiplex PCR, such as PneumoVir ® ) or antigens (e.g., rapid influenza immunoassays, such as Directigen TM EZ Flu A + B). Recently, however, volatile metabolites in exhaled breath have been investigated as potential alternative biomarkers for pathogen detection and identification. For example, volatile metabolites in breath are widely used in the diagnosis of Helicobacter pylori gastritis [14] , and are under investigation for the diagnosis of both acute and chronic respiratory infections [15] . In the murine model, it has been shown that volatile metabolites can discriminate between respiratory infections caused by common bacterial pathogens, including H. influenzae, Klebsiella pneumoniae, Legionella pneumophila, Moraxella catarrhalis, Pseudomonas aeruginosa, S. aureus, and S. pneumoniae [16] [17] [18] . However, unlike bacteria, which produce distinct volatile metabolic signatures derived from fundamental differences in components of both core and secondary metabolism [19] , viruses are entirely reliant on the metabolic machinery of infected cells. Several transcriptomics studies have demonstrated that different infectious agents (both viruses and bacteria) trigger specific pattern-recognition receptors expressed on host immune cells, activating different transcriptional factors that activate specific metabolic programs [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] . For instance, the cytokine profile induced by influenza A virus (IAV) infection in infants is distinct from the profile induced by respiratory syncytial virus (RSV) [30] . In light of these findings, we hypothesized that volatile metabolic signatures could differentiate between virally-infected and uninfected cells. In addition to assessing the diagnostic utility of such an approach, the study of volatile metabolites produced during infection has the potential to generate insight into viral pathogenesis. To-date, few studies have focused on the identification of volatile metabolites produced by cell cultures infected with virus (i.e., influenza, RSV, human rhinovirus, adenovirus, and herpes simplex) [31] [32] [33] [34] [35] . These studies involved basic characterization of the headspace of infected cell culture versus uninfected cell culture, but did not evaluate the discrimination capability of the volatile metabolites produced during infection. The aim of this study is therefore to generate volatile fingerprints of cell cultures infected with virus (both RSV and IAV) and to evaluate their discrimination capability. Volatile metabolites were extracted from the headspace using solid-phase microextraction (SPME) and then separated and identified by comprehensive two-dimensional gas chromatography (GC×GC) hyphenated with a ToF mass spectrometer (MS). The present study represents a novel application of this technique, which is particularly well-suited for the analysis of complex mixtures and is amongst the most powerful analytical tools available today for the analysis of volatile metabolites [36] . Using different machine learning algorithms, we were able to identify volatile metabolic patterns that could discriminate between cells infected with virus and those that were uninfected. Six-well microtiter plates were seeded with HEp-2 cells (a human laryngeal cancer cell line) from the American Type Culture Collection (ATCC ® , CL-23 ™ ) (4× 10 5 cells/well) to be 70%-80% confluent in 24 h. Human RSV (ATCC ® VR-1540 ™ ) was diluted to a multiplicity of infection (MOI) of 0.3 in phosphatebuffered saline. HEp-2 cells were maintained in a growth media consisting of Minimum Essential Medium (MEM) (Corning CellGro 15-010) containing penicillin (100 units ml −1 ) and streptomycin (100 μg ml −1 ) (Hyclone, Pittsburgh, PA, USA), and 10% fetal bovine serum (FBS). For viral infection, the culture supernatant was removed, and cells were inoculated with 0.5 ml of the viral suspension. Plates were incubated at 37°C with a 5% CO 2 atmosphere, with gentle shaking/rocking every 30 min for 1.5 h. After this initial incubation, the supernatant was aspirated and each single well was overlaid with 3.0 ml of MEM containing penicillin-streptomycin and 2% FBS (Corning CellGro 15-010). At 5, 24, 48, and 72 h after the initial inoculation, a microtiter plate was sampled by collecting 2.5 ml of media from each well in a 10 ml air-tight glass vial sealed with a PTFE/ silicone cap (both from Sigma-Aldrich) and frozen at −30°C. At each sampling time, six replicates each of RSV-infected and uninfected cells were collected. Aliquots of 500 000 MLE-Kd cells (a mouse lung epithelial cell line) maintained in 100 μl of 1X Dulbecco's Modified Eagle Medium (DMEM) (containing glucose, L-glutamine, and sodium pyruvate; Mediatech) were infected on ice with 10 μl of a stock of A/PR8/34 H1N1 influenza virus, titrated at ∼1×10 8 TCID 50 (tissue culture infective dose 50%) units per ml for 20 min, corresponding to an MOI of 1. The suspensions were pipetted into 6-well polystyrene tissue culture plates containing 3 ml per well of prewarmed complete media (1X DMEM, 10% FBS, 200 U each of penicillin and streptomycin, and 2 mM extra L-glutamine) (Hyclone). Plates were swirled to mix and incubated at 37°C with a 5% CO 2 atmosphere. At 24, 49, 72, and 122 h, 2.5 ml supernatant for each well were collected into a 10 ml air-tight glass vial sealed with a PTFE/silicone cap (Sigma-Aldrich) and frozen at −30°C. Controls consisting of uninfected cells in media were incubated and collected in parallel. At each sampling time, six replicates each of IAV-infected and uninfected cells were collected. All samples were analyzed within one month of collection. Volatile metabolites were extracted using a divinylbenzene/carboxen/polydimethylsiloxane (DVB/CAR/PDMS) d f 50/30 μm, 2 cm length fiber from Supelco (Bellefonte, PA, USA). The fiber was conditioned before use. Samples (agitated at 250 rpm) were incubated for 15 min at 37°C before fiber exposure for 30 min at the same temperature. The fiber was introduced into the GC injector for thermal desorption for 1 min at 250°C in splitless mode. Restek (Bellefonte, PA, USA). The carrier gas was helium, at a flow rate of 2 ml min −1 . The primary oven temperature program was 35°C (hold 1 min) ramped to 230°C at a rate of 5°C min −1 . The secondary oven and the thermal modulator were offset from the primary oven by +5°C and +25°C, respectively. A modulation period of 2.5 s (alternating 0.75 s hot and 0.5 s cold) was used. The transfer line temperature was set at 250°C. A mass range of m/z 30 to 500 was collected at a rate of 200 spectra/s following a 3 min acquisition delay. The ion source was maintained at 200°C. Data acquisition and analysis was performed using ChromaTOF software, version 4.50 (LECO Corp.). Chromatographic data were processed and aligned using ChromaTOF. For peak identification, a signalto-noise cutoff was set at 50:1 in at least one chromatogram and a minimum of 20:1 S/N ratio in all others. The resulting peaks were identified by a forward search of the NIST 2011 library. For putative peak identification, a forward match score of 800 (of 1000) was required. For the alignment of peaks across chromatograms, maximum first and second-dimension retention time deviations were set at 6 s and 0.2 s, respectively, and the inter-chromatogram spectral match threshold was set at 600. Compounds eluting prior to 4 min and artifacts (e.g., siloxane, phthalates, etc) were removed prior to statistical analysis. A mixture of normal alkanes (C 6 -C 20 ), and the Grob mixture [37] and evaluate the instrument and SPME performance, respectively. The same SPME and GC methods were used, except for the SPME exposition time which was shorter (5 min) to avoid excessive overload of the fiber. Discriminatory features were tentatively identified based on mass spectral similarities to the NIST 2011 mass spectral library, with a match score 800 (of 1000) required for putative identifications. In addition, at least one of the following two criteria were required: (I) a probability 5000 out of 10 000, and/ or (II) an experimentally-determined LRI in agreement (i.e., in the ±10 range), with data reported using the same stationary phase. For the latter information, three main sources were used, namely [38] , an application note (http://blog.restek.com/ wp-content/uploads/2013/04/624silms.pdf), and the Pro EZGC ® Chromatogram Modeler (http:// restek.com/proezgc) (the latter two both from Restek). Most hydrocarbons were generally assigned as 'alkylated hydrocarbons', as it is almost impossible to assign them a specific name based only on the mass spectra similarity, due to the intense fragmentation of this class of compounds into the MS ion source. However, the chemical class of these compounds can be assigned by considering both their location in the twodimensional chromatogram and their mass spectral fragmentation pattern. All statistical analyses were performed using R v3.3.2 (R Foundation for Statistical Computing, Vienna, Austria). Prior to statistical analyses, the relative abundance of compounds across chromatograms was normalized using Probabilistic Quotient Normalization [39] . Data was randomly subdivided into discovery (training) and validation (test) sets 100 times, with 2/3 of samples included in the discovery set, and the remaining 1/3 in the validation set. Three machine learning algorithms were used to identify the most highly discriminatory volatile metabolites and predict the class (i.e., cells infected with virus versus uninfected cells) to which samples in the validation set belonged, namely: random forest (RF) [40] , support vector machines with a linear kernel (linear SVM) [41] , and partial least-squares discriminant analysis (PLS-DA) [42] . Mean decrease in accuracy, feature weights, and variable importance in projection were used as the measures of variable importance for RF, linear SVM, and PLS-DA, respectively [43] . For each of the 100 discovery/validation splits, volatile compounds were ranked according to their discriminatory ability, and different feature inclusion thresholds were compared (e.g., top 10%, 20% and 30%, etc) in terms of predictive ability. A compromise between the number of features included and model accuracy was obtained via the inclusion of the top 20% of features. The class probabilities were used to generate receiver operating characteristic (ROC) curves, and from these ROC curves, sensitivities, specificities, and area under the ROC curve (AUROC) were calculated. The optimal thresholds for class probabilities were calculated using Youden's J statistic [44] , rather than the 0.5 cutoff that is traditionally applied to two-class classification problems. K-means clustering was used to identify groups of volatile metabolites that exhibited similar changes in relative concentration as a function of time, with the relative concentration defined as the difference in the chromatographic area (calculated based on the unique mass, A) between cells infected with virus and uninfected cells (A infected -A uninfected ). The elbow method was used to estimate the optimal number of clusters for k-means clustering [45] . Prior to the statistical analysis of headspace volatiles, the stability of the HS-SPME GC×GC-ToF MS system was assessed using the Grob mixture, both in term of retention time shift and area repeatability. A coefficient of variation (CV %) below 0.2% and 2% were obtained for first and second dimension times, respectively, for all peaks except for 1-octanol, which presented a higher shift in the second dimension (about 20%, standard deviation of 0.2 s). This shift was taken into account in setting the alignment matching parameters. A variation of the area 15% was obtained for all standards considered. To identify volatile metabolic fingerprints that were discriminatory between cells infected with RSV and uninfected HEp-2 cells, the chromatographic data were first pre-processed to remove artifacts, reducing the total number of peak features from 358 to 216. These features were used for further data analysis. RF, linear SVM, and PLS-DA, were used to identify the most highly discriminatory volatile metabolites in the discovery set, and predict the class to which samples in the validation set belonged. This process was repeated 100 times using unique discovery/validation splits for each iteration, and the most highly discriminatory volatile metabolites (top 20%, corresponding to 43 features per iteration) were retained and used to predict the class (i.e., virally-infected cells versus uninfected cells, pooling together the different time points) to which samples in the validation set belonged. The performance of these models was visualized by generating a ROC curve using the validation set class probabilities for each sample, and from these, the AUROC, as well as optimal sensitivities and specificities, were calculated ( figure 1(A) ). The AUROCs were generated using the class probabilities for validation set samples and were similar for RF and linear SVM (0.844 and 0.802, respectively), while PLS-DA performed relatively poorly (0.605). The optimal thresholds for class probabilities ranged from 0.401 for PLS-DA to 0.526 for RF. At these optimal thresholds, RF achieved the highest specificity (0.782) relative to either linear SVM or PLS-DA (0.652 and 0.391, respectively), while PLS-DA achieved the highest sensitivity (0.913) relative to either RF or linear SVM (0.875 and 0.870), albeit with poor overall model performance. To assess the contribution of incubation time to the model performance, we considered the average prediction accuracies for samples at each of the fourtime points evaluated independently (supplementary figure S1 is available online at stacks.iop.org/JBR/12/ 026015/mmedia). RF yielded the highest mean sample classification accuracy at three of four sampling times (5, 48, and 72 h), while SVM yielded the highest accuracy at 24 h. PLS-DA yielded the lowest classification accuracy at all sampling points. Of note, classification accuracy was most highly variable at 72 h, probably related to the confounding effect of natural senescence (and possibly cell death) of the in vitro cell culture, irrespective of the infection process. The top discriminatory features obtained from the three models were compared to evaluate possible overlap. The number of features selected from discovery set samples to predict the classification of validation set samples was held constant across all three machine learning algorithms (n = 43, corresponding to the top 20% of discriminatory features). In total, 92 distinct volatile metabolites were included in the selected features for one or more algorithm, of which nine (10%) were in common across all three algorithms, 10 (11%) between SVM and RF only, six (7%) between RF and PLS-DA only, and three (3%) between SVM and PLS-DA only. The remaining 64 (70%) were unique to a single algorithm ( figure 1(B) ). The ranks of these discriminatory features varied considerably between algorithms. For example, the most discriminatory feature from RF and PLS-DA was identified as hexadecane, which ranked 7th for SVM, while pentadecane, which ranked 1st for SVM, had lower ranks for both RF (2nd) and PLS-DA (4th). A comprehensive listing of all discriminatory volatile metabolites with their feature importance ranks across all three machine learning algorithms is presented in table 1. The relative concentration (A infected -A uninfected ) of all 92 discriminatory metabolites (putatively identified through mass spectral matching) was calculated at each time point individually. K-means clustering was used to identify metabolites with similar behavior as a function of time. Three main clusters were identified. Cluster I included three metabolites (#31: molecule not identified, #32: 2-methyl-pentane, #48: methyl sulfone) which were in highest abundance at the beginning of the infection process (5 h), and subsequently decreased and remained relatively constant between 24 and 72 h (figure 1(C)). Cluster II included four (#71: 2,4-dimethyl-heptane, #77: 4-methyloctane, #92: alkylated hydrocarbon, #97: alkylated hydrocarbon), which remained relatively constant between 5 and 48 h, and then substantially decreased at 72 h ( figure 1(C) ). Of note, for features in cluster II, increased expression was observed in the uninfected cells (rather than decreased expression in RSV-infected cells) at 72 h. Finally, cluster III encompassed the remaining 84, which exhibited no clear temporal trend (supplementary figure S2 ). The chromatographic data obtained for the comparison of cells infected with IAV versus uninfected MLE-Kd were pre-processed to remove artifacts, reducing the total number of peak features from 278 to 177. The performance of the models were visualized by generating ROC curves using the validation set class probabilities for each sample, and from these, the AUROCs, as well as optimal sensitivities and specificities, were calculated ( figure 2(A) ). The AUROCs were similar across the three algorithms employed, with SVM yielding the best overall performance (0.825), followed by RF (0.806), and PLS-DA (0.783). At the optimal classification probability thresholds, sensitivities and specificities were 0.792 and 0.792 for RF (optimal cutoff of 0.499), 0.708 and 0.875 for linear SVM (optimal cut-off of 0.530), and 0.708 and 0.708 for PLS-DA (optimal cut-off of 0.514). The most highly discriminatory volatile metabolites (top 20%, corresponding to 35 features) were retained and used to predict to which class samples in the validation set belonged. In total, 67 distinct volatile metabolites were included across RF, linear SVM, and PLS-DA, of which eight (12%) were common between all three algorithms, 15 (22%) between SVM and RF only, four (6%) between RF and PLS-DA only, and three (4%) between SVM and PLS-DA only. The remaining 39 (58%) were unique to a single algorithm ( figure 2(B) ). Of note, while the most discriminatory features identified from RF and SVM are similar in feature importance rank, (e.g., features #127 and #91, which ranked 1st and 2nd using linear SVM, and 3rd and 2nd using RF, respectively), the top five features obtained using PLS-DA are not included in the top 20% for either RF or SVM, with the exception of #83, which was ranked 31st using RF. The contribution of incubation time to model performance was evaluated by considering the average prediction accuracies for samples at each time points (24, 49, 79 and 122 h) independently (supplementary figure S3) . A general descending trend over time can be observed, with a median approximating 0.5 for all three models at 122 h. PLS-DA yielded the highest mean sample classification accuracy at 49 h with very low variability, while RF yielded optimal classification accuracy at 24 and 79 h. SVM showed large variability at all time points but represented the optimal classification model at 122 h. As with the cell cultures infected with RSV, the variability of prediction increased for the last time point (122 h) for all algorithms, probably due to changes in metabolite production linked to cellular senescence and death. The relative concentrations (A infected -A uninfected ) of the 67 selected discriminatory metabolites (putatively identified through mass spectral matching) as a function of time were again evaluated using k-means clustering algorithm, and four main clusters were extrapolated. In the first cluster, three volatile metabolites (#2, #3, #4, all molecules not identified) were included, whose relative abundance increased between 24 and 72 h, before a decrease by 122 h (figure 2(C) ). The second cluster included three (#23: acetone, #31: molecule not identified, #44: alkylated hydrocarbon) that were detected at 49 h only, and not detected at the remaining time points. The third cluster included two (#35: not identified, #41: n-hexane) that increased between 24 and 49 h then decreased at 79 h only to increase again by 122 h. The relative concentrations of these latter features were negative across all time points, indicating that they were more highly abundant in uninfected controls. We therefore hypothesize that they were related to cell line aging rather than infection. Further studies are necessary to explain this behavior. The fourth cluster included the remaining 59 metabolites which demonstrated no clear trend as a function of time (supplementary figure S4 ). Combining all the features selected from the different models used for discriminating between cells infected with virus (both RSV and IAV) versus uninfected cells, a list of 138 metabolites (20 in common between the two virally-infected cell lines) were generated and tentatively identified according to the criteria reported in the Materials and Methods. Sixty-five (47%) were classified as hydrocarbons, nine (7%) as aldehydes, eight (6%) as aromatic compounds, four (3%) as alcohols, four (3%) as ketones, three (2%) as heterocyclic compounds, two (1%) as sulfur-containing compounds, two (1%) as esters, and finally 41 (30%) as unknowns. It is interesting to note that hydrocarbons comprised a greater proportion of discriminatory metabolites in the comparison of RSV-infected versus non-infected HEp-2 cells relative to the comparison of IAV-infected versus non-infected MLE-Kd cells (56 of 95 compounds (59%) for RSV, versus 18 of 67 (27%) in the IAV experiment). All other chemical classes were similarly represented in the two set of experiments. Five compounds (i.e., acetone, 2-propanol, o-xylene, benzaldehyde, and benzonitrile) have previously been reported in the headspace of cell cultures infected with viruses (three of which in cells infected with IAV, namely 2-propanol, o-xylene, and benzaldehyde) [31, 33, 34] , while forty have been reported in the headspace of cell cultures more generally (mostly cancer cell cultures) (table 1) [33, 34, [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] . The relatively minimal overlap between our study and prior studies that have considered in vitro cells infected with viruses is likely related to a number of factors, such as: the low signal generated by this kind of sample, the different MOIs applied, differences in the cell lines used and viral infection performed, as well as growth conditions and media used, different SPME fiber phase composition which affects the selectivity of the extracted compounds, differences in the analytical techniques utilized, as well as the difficulty in assigning precise identifications to alkylated hydrocarbons, which are generally the most abundant chemical class. Most of the volatile metabolites tentatively identified can be attributed to chemical classes related to the lipid oxidation pathways, namely ketones, aldeheydes, alcohols, and hydrocarbons. They have been reported to originate largely from free radical oxidative fragmentation of lipids due to oxidative stress [19, 59] . It has been shown that viral infection impairs the prooxidant-antioxidant balance in favor of the former by increasing the production of reactive oxygen species, in part through a NAD(P)H oxidase-dependent mechanism [60] . In particular, it has been shown that the activity of superoxide dismutase enzymes increases during viral infection, especially at a mitochondrial level [60] . This increase in reactive oxygen species is directly correlated with the formation of aliphatic hydrocarbons, which can explain the high abundance of hydrocarbons in our samples. However, these findings mainly refer to linear or iso-alkanes, while the origin of most of the alkylated hydrocarbons, which have been identified both in vitro and in vivo is still unclear [59] . An exogenous source for these compounds can also be hypothesized even if a presently undefined metabolic process cannot be excluded; further research has to be carried out to unveil this speculative idea. Nine aldehydes were also putatively identified (i.e., 2-butenal, 2-propenal, 3-methyl-butanal, acetaldehyde, alkylated aldehyde, benzaldehyde, hexanal, nonanal, and propanal). These compounds have been related to lipid peroxidation during the inflammation process, where it is hypothesized that they serve as secondary messengers in signal transduction, gene regulation, and cellular proliferation [59, 61] . Three furan derivatives were found (i.e., furan, 2,3-dihydro-furan, and tetrahydrofuran), which were previously identified in the headspace of cell culture and bacteria [19, 59] . Two sulfur-containing compounds (i.e., methyl sulfone and bis[1-(methylthio)ethyl] disulfide) were also identified. The formation of sulfur compounds have been linked to the sulfur-containing amino acids methionine and cysteine in the transamination pathway, which is affected by an oxidative stress, causing a depletion of such sulfur-containing amino acids [62, 63] . A direct comparison of the volatile metabolic signatures produced by RSV and IAV infection was not possible due to differences in the composition of headspace volatiles at baseline (i.e., differences between uninfected HEp-2 and MLE-Kd cells). We attempted to identify a volatile metabolic fingerprint that could discriminate between infected cells but not discriminate between uninfected cells by using recursive feature elimination coupled to RF (RFE-RF). However, the differences at baseline were sufficiently great such that it was not possible to effectively make such a comparison. This may have resulted from numerous factors, including: (1) the use of different growth media across cell lines, (2) the comparison of a human (HEp-2) and murine (MLE-Kd) lineages, and (3) the comparison of transformed (HEp-2) versus non-transformed (MLE-Kd) lineages. RFE-RF resulted in the identification of 10 volatile metabolites that could differentiate between RSV-infected HEp-2 cells and IAV-infected MLE-Kd cells with approximately 74.9% accuracy, but which also differentiated between uninfected HEp-2 cells and uninfected MLE-Kd cells with 60.0% accuracy. Because of our inability to discriminate between uninfected HEp-2 and MLE-Kd cells, we elected to not report on those compounds that were most highly discriminatory between RSVand IAV-infected cells, as differences in the production of these metabolites may have resulted from factors other than the type of virus used for infection. However, we do note that 21 of the compounds reported as discriminatory overall (table 1) were discriminatory for both sets of experiments (i.e., RSV infected cells versus uninfected cells and IAV infected cells versus uninfected cells). Amongst these 21 compounds, we putatively identified seven hydrocarbons (2-methylpentane, dodecane, and five generic alkylated hydrocarbons), four aromatics (p-xylene, ethylbenzene, toluene, and benzene), two heterocycles (2,3-dihydrofuran and tetrahydrofuran), two alcohols (ethanol and 1-propanol), one aldehyde (acetaldehyde), and one ketone (acetone). The identities of four compounds remain unknown. Notably, ethanol, benzene, and dodecane represent the three metabolites that were identified as discriminatory by two or more machine learning algorithms in both the RSV and IAV experiments. In the present study, we have evaluated the potential ability of volatile metabolites for discriminating between virally-infected and uninfected cells using three different machine learning algorithms, demonstrating the potential effectiveness of the approach. The use of SPME coupled to GC×GC-ToF MS generated 216 and 177 features from the headspace of cells infected with RSV and IAV, respectively. The GC×GC-ToF MS system results in improvements in sensitivity and identification ability compared to conventional GC. The volatile profile obtained resulted, in part, from the specific selectivity of the SPME fiber (PDMS/Car/DVB) used, and do not necessarily mirror the real profile present in the headspace of the sample. A relatively low number of compounds herein identified have been previously reported in the literature, likely related to both biological (different MOI, growth conditions, media, and cell culture) and analytical (sample preparation and analytical determination methods) differences. The choice of host cells was based on their permissiveness to high levels of viral replication, and under these conditions we were able to discriminate between virally-infected and uninfected cells. However, these findings do not necessarily allow for generalization to other cell types. Moreover, the use of different cell lineages for RSV and IAV infections did not allow for the comparison of infections caused by different viruses. Viral infection results in the alteration of numerous biochemical pathways, a subset of which involve the production of small molecules that can cross the cell membrane and thus be detected in the headspace of an infected cell culture. Here we show that volatile compounds can be used to effectively discriminate between infected (RSV and IAV) and uninfected cells. The abundance of these discriminatory volatiles can fluctuate over time according to the infection stage, but, irrespective of the sampling time post-infection, an effective discriminatory prediction was obtained, although a decreasing accuracy was observed after 72 h or 122 h for RSV and IAV, respectively. Future work in this area should involve investigating the utility of volatile metabolites to discriminate between infections caused by different viruses in a single cell line, as well as generate insight into viral pathogenesis. Furthermore, the use of a common cell line for culturing both viruses, specifically a non-transformed human lung epithelial cell line, will be considered. In the present experiments, different cell lines were chosen because of their ability to optimize the replication of the viruses selected, and this limited our ability to identify volatile metabolites that could differentiate between viruses. Further studies will be carried out to answer this latter question. United States UNICEF 2015 Levels and Trends in Child Mortality United States Centers for Disease Control and Prevention 2015 National Action Plan for Combating Antibiotic-Resistant Bacteria 1-63 CDC 2013 Antibiotic resistance threats in the United States Community-acquired pneumonia requiring hospitalization among US children New Engl Addressing the appropriateness of outpatient antibiotic prescribing in the United States: an important first step World Health Organization (WHO) 2005 WHO Recommendations on the Use of Rapid Testing for Influenza Diagnosis Diagnostic problems in lower respiratory tract infections Colonization and infection of the respiratory tract: what do we know? Paediatr Review of rapid diagnostic tests used by antimicrobial stewardship programs Filmarray, an automated nested multiplex PCR system for multi-pathogen detection: development and application to respiratory tract infection Evaluation of four commercial multiplex molecular tests for the diagnosis of acute respiratory infections Review article: 13C-urea breath test in the diagnosis of Helicobacter pylori infection-a critical review Clinical application of volatile organic compound analysis for detecting infectious diseases Secondary electrospray ionization-mass spectrometry (SESI-MS) breathprinting of multiple bacterial lung pathogens, a mouse model study Detecting bacterial lung infections: in vivo evaluation of in vitro volatile fingerprints Robust detection of P. aeruginosa and S. aureus acute lung infections by secondary electrospray ionization-mass spectrometry (SESI-MS) breathprinting: from initial infection to clearance Bacterial volatiles: the smell of small organisms A comprehensive time-course-based multicohort analysis of sepsis and sterile inflammation reveals a robust diagnostic gene set Comprehensive validation of the FAIM3:PLAC8 ratio in time matched public gene expression data Am A molecular host response assay to discriminate between sepsis and infection-negative systemic inflammation in critically ill patients: discovery and validation in independent cohorts A molecular biomarker to diagnose community-acquired pneumonia on intensive care unit admission Am Gene expression profiles in febrile children with defined viral and bacterial infection Proc A host-based RT-PCR gene expression signature to identify acute respiratory viral infection Sci Superiority of transcriptional profiling over procalcitonin for distinguishing bacterial from viral lower respiratory tract infections in hospitalized adults Innate immunity: structure and function of Innate immunity: minireview the virtues of a nonclonal system of recognition Innate immune recognition: mechanisms and pathways A comparison of cytokine responses in respiratory syncytial virus and influenza A infections in infants Effect of influenza vaccination on oxidative stress products in breath Volatile emanations from in vitro airway cells infected with human rhinovirus Volatile organic compound gamma-butyrolactone released upon herpes simplex virus type -1 acute infection modulated membrane potential and repressed viral infection in human neuron-like cells Volatile organic compounds generated by cultures of bacteria and viruses associated with respiratory infections Modulators for comprehensive two-dimensional gas chromatography Linear retention indices in gas chromatographic analysis: a review Flavour Fragrance J Comparison of volatile organic compounds from lung cancer patients and healthy controls-challenges and limitations of an observational study Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1 H NMR metabonomics Partial least squares for discrimination Vizualization and recovery of the (Bio)chemical interesting variables in data analysis with support vector machine classification Youden index and optimal cutpoint estimated from observations affected by a lower limit of detection Introduction to Machine Learning ed T Dietterich 2nd edn A compendium of volatile organic compounds (VOCs) released by human cell lines TD-GC-MS analysis of volatile metabolites of human lung cancer and normal cells in vitro Cancer Epidemiol Release of volatile organic compounds (VOCs) from the lung cancer cell line CALU-1 in vitro Cancer Cell Int Volatile biomarkers from human melanoma cells Release of volatile organic compounds from the lung cancer cell line NCI-H2087 in vitro Investigation of cell culture volatilomes using solid phase micro extraction: options and pitfalls exemplified with adenocarcinoma cell lines Analysis of volatile organic compounds liberated and metabolised by human umbilical vein endothelial cells (HUVEC) in vitro In vitro detection of small molecule metabolites excreted from cancer cells using a Tenax TA thin-film microextraction device Quantification of acetaldehyde and carbon dioxide in the headspace of malignant and non-malignant lung cells in vitro by SIFT-MS Identification of volatile biomarkers of gastric cancer cells and ultrasensitive electrochemical detection based on sensing interface of Au-Ag alloy coated Unique volatolomic signatures of TP53 and KRAS in lung cells Headspace sorptive extraction and GC-TOFMS for the identification of volatile fungal metabolites Assessment, origin, and implementation of breath volatile cancer markers Respiratory syncytial virus induces oxidative stress by modulating antioxidant enzymes Am Reactive oxygen species and alpha, betaunsaturated aldehydes as second messengers in signal transduction Ann Measurement and biological significance of the volatile sulfur compounds hydrogen sulfide, methanethiol and dimethyl sulfide in various biological matrices Oxidative stress-induced regulation of the methionine metabolic pathway in human lung epithelial-like (A549) cells Financial support for this work was provided by Hitchcock Foundation and the National Institute of Health (NIH, Project#1R21AI12107601). CAR was supported by the Burroughs Wellcome Fund Institutional Program Unifying Population and Laboratory Based Sciences, awarded to Dartmouth College (Grant#1014106), and a T32 training grant (T32LM012204, PI: Christopher I Amos). P-H Stefanuto is a Marie-Curie COFUND postdoctoral fellow co-funded by the European Union and the University of Liège.The authors gratefully acknowledge Supelco for providing the SPME fiber. Giorgia Purcaro https:/ /orcid.org/0000-0002-8235-9409 ReesChristiaan A https:/ /orcid.org/0000-0003-1896-5348