key: cord-0119716-hzyscbeg authors: Adhikary, Rishiraj; Lodhavia, Dhruvi; Francis, Chris; Patil, Rohit; Srivastava, Tanmay; Khanna, Prerna; Batra, Nipun; Breda, Joe; Peplinski, Jacob; Patel, Shwetak title: SpiroMask: Measuring Lung Function Using Consumer-Grade Masks date: 2022-01-23 journal: nan DOI: nan sha: 052d32a2f7c344289eaed575c82895aa61591be0 doc_id: 119716 cord_uid: hzyscbeg According to the World Health Organisation (WHO), 235 million people suffer from respiratory illnesses and four million people die annually due to air pollution. Regular lung health monitoring can lead to prognoses about deteriorating lung health conditions. This paper presents our system SpiroMask that retrofits a microphone in consumer-grade masks (N95 and cloth masks) for continuous lung health monitoring. We evaluate our approach on 48 participants (including 14 with lung health issues) and find that we can estimate parameters such as lung volume and respiration rate within the approved error range by the American Thoracic Society (ATS). Further, we show that our approach is robust to sensor placement inside the mask. . SpiroMask: Our system to estimate lung health parameters by processing the audio data collected from a microphone fitted inside consumer-grade masks. countries 1 . COPD development generally starts early in life due to a complex interplay of disadvantageous factors, many of which occur in low and middle income countries [6] . Studies have shown that lung diagnosis is typically missed or delayed until poor lung health conditions advance. Early diagnosis of lung ailments can positively influence disease course, slowing progression, relieving symptoms and reducing the incidence of exacerbation [48, 59] . But unavailability of lung function equipment hinders proper diagnosis [52] . Regular lung health monitoring can lead to prognoses about deteriorating lung health conditions [2, 56] . Typically, two kinds of breathing data are used for deriving lung health bio-markers: • Tidal breathing or normal breathing: Tidal breathing refers to inhalation and exhalation during restful breathing. Lung disease can change the normal character of tidal breathing [34] . • Forced breathing: It is performed by taking a deep inhalation resulting in full expansion of the chest followed by a forceful exhalation. Forceful breathing is voluntary in nature. Forced Expiratory Flow (FEF) measurement deduced from forced breathing is the most widely used method to assess the severity of asthma. Monitoring the forcefully exhaled airflow can help diagnose the onset of asthma, COPD and other conditions that affect breathing [26, 47, 58] . But, FEF measurements require a controlled environment. Moreover, a majority of young and old patients with airway obstruction are not able to perform adequate forced breathing maneuvers [58] . Prior research has shown that tidal breathing patterns can also be used to detect and quantify airway obstruction [62] . Respiration rate can be deduced from tidal breathing. It is defined as the number of breaths taken by a person in a minute. Respiratory rate deduced from tidal breathing is an important marker of cardiac arrest, dyspnea [13, 17, 24, 63] , accessing sleep quality and monitoring stress [12] . Lung inflammation caused by COPD deterioration or lung infection leads to a higher respiration rate [26, 36] . In clinical settings and hospitals, exhaled airflow is measured using a spirometry test (Figure 3 a) . During a spirometry test, a patient performs forceful breathing through a flow-monitoring device (a tube or mouthpiece), which measures instantaneous flow and cumulative exhaled volume. However, spirometry tests performed at hospitals are not transient, and the recent global pandemic has led to the suspension of certain non-urgent healthcare services such as routine diagnostic testing [20, 23] . Although home spirometry tests are available [33] , even the cheapest hand-held digital spirometer cost about USD 300. Previous work has accurately estimated forced breathing parameters using a smartphone [25] . In smartphone spirometry, a person must do a maneuver of forceful breathing towards the cellphone. But, the approach to microphone-based smartphone spirometry requires accounting for variability in the environment and smartphone heterogeneity among others. Previous studies [36] have also used accelerometers in smartphones for continuous respiration monitoring. However, the approach requires a controlled environment where the participant must put the smartphone in a particular location on the chest. Recently, studies [72] have shown that wearable spirometry can be conducted using a pressure sensor inside a specialised mask meant for athletes. Progress on mask spirometry is limited because, i) the evaluations were done on healthy adults alone, ii) there was no continuous monitoring of respiration rate, iii) athlete training masks are relatively costly (40) (41) (42) (43) (44) (45) (46) (47) (48) (49) (50) compared to consumer-grade masks (5-10 USD For N95 mask), iv) athletic training masks are meant to restrict the oxygen received by a person to create a high-altitude environment 2 , as such it cannot be a replacement for generally used cloth or N95 mask. In Section 8 we qualitatively show that the general population would not prefer a specialised mask as a daily wearable. This paper shows that consumer-grade masks (N95 and cloth masks) can be used for continuous lung health monitoring by processing the signal from the microphone retrofitted inside the mask (Figure 1 ). The main intuition behind using audio is to leverage the relationship between variation in air flow rate and the intensity change in tracheal sound [68] . Our work address the limitations of prior research [1, 26, 72] . In particular, retrofitted consumer-grade masks i) are robust to environmental variability, ii) can sense tidal breathing, iii) is accurate for participants with lung ailments and iv) is passive, i.e. no interaction to device is required by the user. Our approach uses the audio waveform envelope and the filtered audio signal to derive vital lung parameters and continually monitor respiration rate. We used machine learning with sequential forward selection techniques to learn a set of audio features that accurately estimate lung health. We have two separate pipelines for estimating parameters from forced breathing and tidal breathing. To estimate respiration rate (from tidal breathing), we used a neural network that distinguishes tidal breathing from speech and noise. Peak detection algorithm applied over tidal breathing signals accurately calculates the respiration rate. The parameters estimated from forced breathing are described in the next section. Our study was approved by the Institutional Review Board (IRB). We recruited 48 participants, including 15 female participants. A total of 14 participants had lung ailments. The number of healthy and unhealthy participants in our trial is comparable to study population of past studies [15, 26, 38, 65, 69, 72, 73] . For each participant, the study lasted for about an hour. The mean percentage errors on forceful breathing parameters for cloth masks were between 3.24% to 4.86%. For the N95 mask, they were between 3.26% to 4.36%. We achieved an accuracy of 94.7% on classifying tidal breathing from noise and speech. The mean absolute error on the estimation of respiration rate was 0.36 for the cloth mask and 0.47 for the N95 mask. Our results on forceful breathing are within the acceptable error range as endorsed by the American Thoracic Society or ATS. We have performed sensitivity analysis on the position of the sensor inside the mask. Our approach is robust to sensor placement for forced breathing. However, certain positions (directly below the nose) inside the mask are ideal for estimating respiration rate from tidal breathing monitoring. To summarise, the main contributions of this paper are: • SpiroMask: a novel passive mask-based system for estimating forced and tidal breathing to assess lung health parameters using a microphone that works on consumer-grade masks -is accurate within the ATS guidelines works well for both healthy and unhealthy subjects is robust to the microphone placement • Public dataset: We publicly release our dataset at Github 3 We believe ours is the first such large publicly available dataset that measures both the tidal and forced breathing parameters and ground truth for 48 participants (including 14 with lung ailments). We believe that our dataset can help advance research in the community. • Reproducibility: We believe that our work is fully reproducible. We use the same repository as above for the code. All the generated tables and graphs have corresponding scripts to reproduce all the results. We believe our efforts towards reproducibility will lower the effort towards replication and building on top of our work. This section underlines the basics of spirometry and lung function indices, followed by an overview of prior work on audio-based and pressure-based sensing of lung indices. Spirometry is a widely used pulmonary function test. It measures how fast and how much air the patient can breathe out and is the most widely employed objective measure of lung function [57] . Spirometry tests are usually performed in clinics or hospitals. Currently, the most commonly used devices for respiratory evaluation are hand-held spirometers. A spirometry test consists of the following sequence of events [16] . • A soft clip is placed on the patient's nose to start normal breathing through her mouth. • The patient wraps her lips tightly around the spirometer mouthpiece shown in Figure 3 (a), ensuring that all the exhaled air goes through into the spirometer for accurate measurement. • The patient then takes the deepest possible breath, filling her lungs to the maximum. • The patient exhales hard and fast and continues exhaling into the spirometer until no more air comes out. Spirometry results are effort dependent. Patients need to be coached by a trained professional to perform a forceful exhalation for a successful spirometry test. A digital spirometry test produces the flow versus volume plot of the lung 4 from which vital lung parameters are extracted. A forceful breathing maneuver requires a controlled environment with proper guidance from doctors. A standard spirometer measures the volume and flow of air that can be inhaled and exhaled. Spirometry generates pneumotachographs, which plot the volume and flow of air coming in and out of the lungs. The most common parameters measured by spirometry are forced vital capacity (FVC), forced expiratory volume at one second (FEV1), and peak expiratory flow (PEF). FVC is the total air volume exhaled denoted by C in Table 1 , FEV1 is the exhaled air volume in the first second of exhalation, denoted by B. PEF is the maximum airflow velocity in exhalation, denoted by A. Researchers have explored the potential to turn existing mobile devices and smartphones into portable electronic spirometers using their inbuilt microphones supplemented with machine learning techniques [25, 37, 46, 47, This ratio should be > 80% among healthy [7] E RR Respiratory Rate (samples/min) Table 1 . Forced breathing and tidal breathing lung function parameters. 55, 61, 73] . These studies use the features from the Hilbert transformation [19] , linear predictive coding [32] and the spectrogram of an audio signal to train a linear regression model for each of the vital lung parameters, i.e. FVC, FEV1 and PEF. Compared to the clinical spirometer, the reported mean error is 5.1%, 5.2% and 6.3% for FEV1, FVC and PEF, respectively. In our work, we show that Hilbert transformation with a finite impulse response filter gives a comparable or better estimate of lung function parameters, and we can avoid non-trivial modelling like linear predictive coding. Also, given that the audio sensor is fixed in the mask, we are not required to compensate for pressure losses sustained over the variable distance from the mouth to the microphone, and reverberation/reflections caused in and around the subject's body. Researchers have also proposed a variable frequency complex demodulation method (VFCDM) technique to extract the FEV1/FVC ratio [55] from audio. A built-in smartphone microphone was also used in [73] to estimate FEV1 and FVC. However, they do not estimate PEF or tidal breathing parameters like respiration rate. Commodity smartphone has also been used to measures the humans' chest wall motion via acoustic sensing. Lung function indices are deduced from the measured motion [47] . Existing literature on extracting PFT parameters requires a person to perform the forced breathing maneuver in a controlled noiseless environment. • It requires accounting for variability in the environment (distance between microphone and mouth). • Not all smartphones are created equal. The flow detected by the microphone in smartphones [26] relies on the mechanical transduction of sound, which is affected by the position of the microphone and the physical casing surrounding it [29] . • It is not suited for sensing tidal breathing as a person will not keep the smartphone near their nose for a long duration. • It is relatively less accurate for participants with lung ailments [26, 47] . • It requires a user to actively interact with the smartphone. Recent efforts have been made to integrate face masks with audio and pressure sensors to extract vital lung parameters [1, 72] . Researchers have used a MEMS-based barometric pressure sensor inside an athlete training mask 5 . Such masks are specially designed for athletes, restricting breathing, making an athlete feel like they are at a high altitude. Accurate wearable spirometry has been performed in athlete masks with error margins of 2.9% and 3.3% for FVC and FEV1 respectively [72] . Researchers have also experimented by integrating an audio sensor inside a surgical mask to estimate FVC from the energy of the audio signal but they have not quantified the error [1] . However, progress in mask spirometry is limited in the following ways: • Continuous monitoring of tidal parameters was absent. • The proposed approaches are not suitable for consumer-grade N95 and cloth masks. • Previous work did not consider participants with lung ailments in their user-study and thus the efficacy is not known. • There was no discussion on the robustness of sensor positioning inside the mask. Table 2 compares our work with previous literature, and shows the novelty of our work for: i) estimating lung parameters for consumer-grade cloth and N95 masks; ii) single system for measuring both forced and tidal breathing parameters; iii) a larger study with unhealthy and healthy subjects; and a discussion on robustness of sensor placement. Prior literature has successfully used both an in-situ microphone and a MEMS barometer inside a very tightly sealed athlete mask to sense breathing and perform portable spirometry [1, 72] . However, our experiments show that a pressure sensor cannot be used to sense breathing inside a consumer-grade cloth mask. Figure 2 (a) and (b) shows that the pressure sensor outputs a similar signal (Pearson correlation coefficient, = 88%) when the cloth mask is worn and not worn. The amplitude in Figure 2 (b) is higher than in Figure 2 (a) because the pressure inside a mask is higher than the atmospheric pressure. However, there is little to no change in the signal to distinguish between inhalation and exhalation. Previous work [72] leveraged the change in pressure to detect forceful breathing inside a special kind of mask. However, the differential pressure is insignificant in N95 and cloth masks making the breathing signal unrecognisable. It implies that pressure sensors are ineffective in distinguishing tidal breathing in cloth masks unless the sensor is sealed from atmospheric pressure. Figure 2 (c) and (d) shows that a microphone does a better job (Pearson correlation coefficient, = 1%) in sensing tidal breathing inside a cloth mask. These results suggest that commodity-grade pressure sensors placed on standard masks are likely to result in poorer lung health estimation than audio sensing. Thus, we do not baseline against this previous work [72] in mask based spirometry as the approach does not work on consumer-grade masks. shows that tidal breathing is indistinguishable from a pressure sensor placed inside the cloth mask whereas (c) and (d) show that microphone is more suitable inside cloth mask to monitor tidal breathing. Thus, previous work [72] in mask based spirometry does not work on consumer-grade masks. We now describe our data collection procedure. Our user study on SpiroMask was approved by the Institutional Review Board (IRB). All participants between 18 to 70 years of age could become a participant in the study. People who are severely ill and have been advised bed rest by doctors were not allowed to participate in the study. We recruited 52 participants for the study, out of which 16 participants reported having lung ailments. All participants were remunerated as per institute guidelines. Every participant filled out an entry survey before the user study. We asked them to declare their age, height, weight, any recent illness, history of contracting COVID19 and if they had a meal before coming for the user study. We also asked them if they had a clinically validated history of lung ailments and if they had done a spirometry test in the past. 3.3.1 Microphone. We recorded the audio of forceful expiration and tidal breathing using an Arduino nano sense microcontroller 6 due to the availability of embedded sensors like the MEMS microphone. Its compact size makes it possible to affix the microcontroller on a face mask. The microphone has a sampling rate of 16 kHz. We placed our microcontroller inside a 3D printed enclosure ( Figure 3 (f) to protect it from static discharge and wear and tear. The 3D enclosure is affixed to the mask using velcro and double-sided tape (Figure 3 (e). We advised the participants to affix the microcontroller on their mask to ensure their comfort and safety while wearing it. We confirmed that the mask was appropriately worn before starting the SpiroMask experiment. Our future system would be smaller and hand-sewable on the fabric. Our setup also consisted of the Helios 401 7 medical-grade (ISO 14971:2019 [54] ) hand-held spirometer. We used the spirometer to collect the ground truth lung capacity of every participant. We used a Samsung Galaxy M20 smartphone to collect accelerometer data for the expansion and contraction of the chest. The accelerometer had a sampling rate of 100 Hz. t The sensor inside the mask was connected to a desktop computer via USB cable. To ensure no data loss and interruption (due to draining battery) during the user study, we refrained from using any wireless mode of audio data transfer. For every participant, the investigator checked if the microphone and the smartphone are responding to remote commands. We used the MATLAB mobile application 8 to collect accelerometer data from the smartphone. An investigator could issue remote commands from their laptop to retrieve audio and IMU data from the microphone and smartphone. The sensitive audio data was stored securely in the cloud. We followed all local COVID19 guidelines during the entire process. The tests were done in a well-ventilated room with a single participant at a time. The investigator was double vaccinated and wore an N95 mask during the entire duration of the experiment. On the participant's arrival, we asked the participant to sanitise their hands. In the spirometry test, each participant used a new mouthpiece. We did not reuse any N95 or cloth masks. The investigator gave a fresh piece of both the masks. Total Participants (n) 48 37 Participants with lung ailments (n, %) 14 We started the user study with data collection for forced breathing, as shown in Figure 3 . We conducted the study over two phases. Phase 1 was in January -February 2021, and Phase 2 began in July 2021 9 . Table 3 summarises the participant demography. The investigators explained the entire user study to the participants. We demonstrated the Spirometry test to the participants. For most of the participants, we could obtain the spirometry ground truth in the first attempt itself. The proprietary spirometer software flagged incorrect attempts for some participants. For every wrong maneuver, we repeated the spirometry test at least once, after which we obtained the ground truth. SpiroMask test followed the spirometry test. Forceful exhalation using the mouth is made possible in a hand-held spirometer due to the availability of a dedicated mouthpiece. Based on some pilots experiments, we realised that it is easier to do a forceful exhalation using the nose with mouth closed when a face mask is worn. We asked the participants to put the sensor inside the mask in position R3 ( Figure 4 ) and wear it. We issued the remote command to start data collection and instructed the participant to begin deep inhalation and forceful exhalation. Each forceful breath lasted for 6 to 8 seconds. We collected samples of forceful breathing audio for different positions inside the mask (L1, C1, R1, L3, R3 in Figure 4 ). We chose these positions based on our pilot experiments and the relative comfort of placing the sensors in these positions. We then repeated the SpiroMask test using the cloth mask. Despite several attempts, four participants were unable to perform the spirometry test. One participant had buccinator muscle pain making it impossible for her to hold the mouthpiece of the spirometer. Three other participants could not perform a proper forceful breathing maneuver. They complained about lung discomfort 9 Our country was very significantly impacted by COVID19 between March and June and thus we suspended data collection during that time when asked to attempt a forceful exhalation. The investigator decided not to continue with the spirometry test for these three individuals anticipating worsening medical conditions. It should be noted that while these three people expressed relative comfort using our system SpiroMask, we still had to discard their samples owing to the lack of verified ground truth. We finally had data for 48 participants, out of which 14 had lung ailments. The number of healthy and unhealthy participants in our study is comparable or more than in several similar studies [15, 26, 38, 65, 69, 72, 73] . shows the spirometry test followed by a SpiroMask test. c) shows tidal breathing user-study where a smartphone is placed on the sternum with the help of a belt. d) shows how the retrofit device was fixed into both the type of mask. e) shows the protective velcro tape layer above the sensor to protect it from mucus, and f) shows the microcontroller placed inside the 3D printed casing. Data collection on tidal breathing started in the second phase (July 2021) of our user study. Table 3 summarises the participant demography. First, we asked the participants to place the smartphone in the sternum with the help of a chest belt, as shown in Figure 3 . Previous work [10] has shown that putting an accelerometer in the sternum can help us retrieve the respiration rate by leveraging the movement of the rib cage and the movement of the abdomen. Previous work has also used a stretch sensor combined with a motion sensor in a chest belt to measure breathing parameters, even when the user is ambulatory [65] . Proprietary respiration belts were used in some studies [45, 69] to collect the ground truth on respiration rate. Such ground truth systems are very expensive and unavailable in our country. Similar to the SpiroMask test for forced breathing, we asked the participant to wear the mask and start tidal breathing. We issued remote commands from a laptop to start data collection from the smartphone and the microphone simultaneously. We also instructed the participants to begin counting their exhalations after issuing the remote command. Each sample of tidal breathing lasted for 20 seconds. We took at least two samples at each position (L1, C1, R1, L3, R3 in Figure 4 ) of the mask. These positions covered the entire space of both the masks where the sensor could be placed. We repeated the SpiroMask test for tidal breathing on a cloth mask. Some participants were not comfortable in strapping the smartphone to the chest with a belt. For all such participants, we relied on their self-count and a metronome test. We also performed a metronome test [41] for a subset of participants to validate the use of accelerometers to detect actual respiration rate. Previous studies have explicitly used a metronome as a ground truth to monitor respiration rate [10] . The metronome test is similar to the procedure of tidal breathing. But, in addition, we asked the participant to inhale and exhale as per the clicks of a 40 Beat Per Minute (BPM) metronome. The metronome clicks were played on a Desktop PC. We did not hear the click sound in the audio of tidal breathing probably because of its low amplitude. We choose 40 BPM because the participants were more comfortable at a lower breathing pace. Figure 5 shows a 20-second window of audio and IMU data for a participant performing breathing as per the metronome. The IMU data was filtered using a moving average filter with a window of 20 data points. There are six breathing cycles which is close to the theoretical number of 20beats 60s * 20 = 6.66beats. Figure 5 validates the use of accelerometer as ground truth for respiration rate. The participants finally filled out an exit survey where they gave their feedback and opinion on the comfort of masks and spirometry tests. Particularly, we asked the following optional questions: (1) Can you compare the spirometer and SpiroMask test-which did you prefer and why? (2) Which mask (out of N95 and cloth masks) feels more comfortable to you and why? (3) Can you compare the SpiroMask and phone with chest belt for respiration rate monitoring-which did you prefer and why? The entire user study took approximately an hour per participant. The goal of SpiroMask is to estimate vital lung parameters and respiration rate using audio of breathing maneuver. Our work is inspired by the advancement of smartphone [25] and wearable spirometry [72] . The main intuition behind using audio is to leverage the relationship between variation in air flow rate and the intensity change in tracheal sound [68] . We now describe the pipeline for estimating the forced breathing and tidal breathing parameters. A spirometry test comprises of forceful exhalation through a flow monitoring device that measures instantaneous flow and cumulative exhaled volume (Table 1) . Similar to a spirometry test, a user is required to wear our smart mask, breathe in their full lung volume, and forcefully exhale. The entire pipeline of extracting forced breathing parameters is shown in Figure 6 . We now explain the steps below. (i) Recording Audio: The microphone inside the mask records the exhalation and sends the audio data to a computer. (ii) Normalising amplitude: The audio data is normalised between -1 and 1 [53] . (iii) Clipping audio: Keeping in terms with the ATS guidelines [11] , we extract the part of the signal which represents a second before the start of forceful exhalation till the end of it. The start of forceful exhalation is detected using a threshold amplitude. (iv) Envelope Detection: The audio signal's envelope can be assumed to be a reasonable approximation of the flow rate because it is a measure of the overall signal power (or amplitude) at low frequency [25] . We obtain an estimation of the acoustic envelope of the forceful exhalation using Hilbert Transform (HT) [19] . This method has been employed in previous acoustical flow estimation studies [25, 53] . To validate HT's estimated envelope for a signal with multiple harmonics, we generated a synthetic amplitude modulated signal using a message signal (the envelope) and a carrier wave [35, 43, 49] . Figure 17 (in Appendix) shows that the envelope estimated is a poor fit to the true envelope. We improve the estimated envelope using an finite impulse response (FIR) filter with Kaiser window [35] . Since an audio signal comprises multiple harmonics, we will need to process the HT envelope using an FIR filter with Kaiser window. The estimated lung parameters' error margin would depend on stopband attenuation and transition width of the filter. Cepstral Coefficient (MFCC MVN), the power spectrum of 257 frames, 64 Mel Bands of spectogram, 12 chromabin generated from the power spectrum of the audio waveform. Besides these features, we also generated temporal and statistical features from the audio waveform and the FV curve [4] . Temporal features include information about entropy, peak to peak distance, the centroid of the signal etc. Statistical features include a histogram of the signal, interquartile range, mean and median absolute deviation, kurtosis and skewness. Previous literature [4] lists all the statistical and temporal features and how they are computed for time series data. All the features are described in Table 4 in the Appendix. MFE features are known to distinguish speakers by modelling the shape of the vocal tract [28, 51] . Previous literature has used MFCC, power spectrum, temporal and statistical features to classify abnormal lung sound and non-speech sounds [8, 38, 40] . (viii) Machine Learning: Models: Given the extracted features and ground truth from spirometry, we train three supervised regression methods: linear regression, random forest (RF) and support vector regression (SVR) models to predict PEF, FEV1 and FVC from the set of 819 acoustic features and 390 statistical features from the FV curve. There are two main reasons for choosing these three algorithms. i) Linear regression is easy to implement and thus would be best suited for edge devices with limited compute capabilities. ii) RF and SVR can learn non-linear decision surfaces. They are also non-parametric algorithms (SVR with Radial Basis Function (RBF) is non-parametric) which means that their complexity grows as more data is made available. Specific features: To predict PEF, we used the cumulative sum of MFE, Log MFE, MFCC MVN, Power Spectrum and Melspectogram as an input to the regression model. To predict FEV1, we clipped the portion of the audio waveform after 1 second of the start of exhalation. These changes in features were inspired by previous literature [26] . For FVC, we used the features from entire audio waveform. Feature selection: We used sequential forward selection technique (SFS) [18] to reduce the feature space for faster computation and prevent overfitting. Our objective to to estimate respiration rate from tidal breathing. The first step in this process is to classify the audio samples as speech, tidal breathing or noise. Thereafter, we apply a peak detection algorithm for every sample classified as tidal breathing to derive the respiration rate. Classification Task: The entire pipeline of estimating respiration rate is explained below. • Feature Generation: Expiratory nasal sound and airway sound from the mouth lies in the frequency band of 300-4000 Hz [8, 42] . For each audio sample, we used a band pass filter with a low cut off frequency of 50Hz and high cut off frequency of 500Hz. This filter is used to eliminate background noise and the values were found experimentally. The audio samples were sliced using a rectangular window as shown in Figure 7 (b). We describe the choice of window size and the size of offset between subsequent window in Section 5. • Estimating Respiratory Rate: We used a respiratory rate detection algorithm to detect the respiratory rate for samples labelled as tidal breathing by the 1D CNN. We applied the Hilbert Transform envelope detection algorithm to the audio samples as described in Section 4.1. We used a peak detection algorithm [60] over the Hilbert envelope. The number of peaks corresponds to the respiration rate. Signal Processing: The amplitude of the forced breathing audio recordings was normalised between -1 and 1, and clipped in time. Then, Hilbert transform was applied to deduce the approximate flow versus time curve by applying the pipeline explained in Section 4. The shape of the 'flow-volume' curve depends on the design of a minimum-order FIR filter. We need a minimum order filter to save on processing time for each sample and to ensure numerical stability [44, 66] . The transition width ( ) and the stopband ripple ( ) of a FIR filter decide the order of the filter. We used = 2 as the transition width, (where is the sampling rate which is 12 kHz) and = −10 as the stopband ripple to obtain the correct shape of flow-volume curve. Finally, one of the investigators visually analysed the flow-volume curve of every participant for its correct shape [22, 27] . In our future work, a machine learning model would classify correct and incorrect flow-volume curve based on the approach in the previous work [27] . Figure 8 (c) and Figure 8 (f) shows a correct and incorrect shape of the flowvolume curve, respectively. While Figure 8 (c) shows end expiratory curvilinearity [30] , the same curvilinearity is missing in Figure 8 (f). Beside curvilinearity we also relied on previous literature [27] to distinguish between correct and incorrect shape of flow-volume curve. The proper shape implies a successful forceful breathing maneuver. Cross validation: In both RF and SVR, we use a nested leave-one-out (subject) cross-validation strategy (LOOCV). The outer loop is used for predicting the lung parameters for a test participant, where all but that participant is used in the train set. The inner loop is used to fine-tune the hyper-parameters. Metric: We reported the mean percentage error across all participants for FEV1, FVC and PEF. The percentage error is given by | − | * 100 where is the ground truth value and is the estimated value. Tuning Hyperparameters: The number of trees in RF span from 5 to 500 in steps of 10. The nodes in RF were expanded until leaves contain less than two samples. For SVR, the hyperparameter space consisted of the regularisation parameter and the type of kernel among linear, radial basis function and polynomial. We used grid search strategy to find the optimal hyperparameters. We discuss the results of forced breathing in Section 5.2. Our overall result for N95 mask is shown in Figure 9 . The RF regression performs the best across participants with a healthy and unhealthy lung condition. Across all the participants, we have an Mean Percentage Error (MPE) of 5.3% on PEF, 4.72% on FEV1 and 5.11% on FVC for the N95 mask. To explain the direction of bias, Figure 10 shows the Bland-Altman [5] plot of each lung function measure. The vertical axis shows the difference between SpiroMask and Spirometer. The horizontal axis shows the mean value of the two methods. Measures taken for healthy participants are shown in blue dots and unhealthy participants in orange dots. Lines indicating the ±2 are shown as dashed lines. From the plot, it can be seen that SpiroMask generalises well across healthy and unhealthy participants. On average, the ground truth Spirometer has slightly higher value for all the three parameters. Although, the PEF has higher variability compared to FEV1 and FVC, but agreement holds true for PEF as well. For the cloth mask (Figure 11 ), we have MPE of 4.86% on PEF, 4.20% on FEV1 and 3.67% on FVC. The Bland-Altman plot for cloth mask ( Figure 12) shows a similar agreement between spirometer and SpiroMask. Our results for all the lung parameters fall within the clinically relevant range. As mentioned in previous work [26] , a clinically relevant range is used because a participant cannot simultaneously use a spirometer and SpiroMask, so actual ground truth is unattainable. The limit of variability for a measure of lung function value should be within 7% over short duration as mandated by the ATS guidelines [30] . Our result on both the masks are well within the ATS guidelines for both healthy and unhealthy participants. Details of confounding factors and trends are described in Section 7. Our result on PEF is better on the N95 mask. FEV1 and FVC are marginally better on cloth masks. In the future, we plan to study the comparative performance of the two masks using statistical tests. Fig. 9 . For N95 mask, the prediction error for participants with healthy and unhealthy lung condition is acceptable as per the ATS guidelines [30] for the Random Forest regression model. We have the following set of hyperparameters: the window size, offset size and Fast Fourier Transform (FFT) length (The choice of the hyperparameter space is described in Appendix). The window size cannot be bigger than the shortest audio sample, and the offset size can at maximum be as large as the window size. We used 6-fold cross-validation where the inner loop was used for tuning hyperparameters. Rate. For all samples that were classified as tidal breathing, we applied Hilbert Transformation over the sample, followed by peak detection. An algorithm compared the number of peaks in each tidal breathing sample with the ground truth accelerometer signal and returned the Mean Absolute Error (MAE) for each participant. Our result in Figure 13 shows that we achieve an accuracy of 94.7%. The confusion matrix shows that the CNN misclassified 2% of noise samples and 7% speech samples as tidal breathing. As described in Section 4.2, we segmented each audio sample into smaller windows, and the classifier predicts the Average Error For Participants Without Lung Discomfort Fig. 11 . For Cloth Mask, the prediction errors for cloth mask is comparable to N95 mask. The standard deviation of FVC is higher compared to N95 mask. The prediction error across all participants is within the ATS guidelines [30] for the Random Forest regression model class of every window. The CNN assigns one class (Tidal Breathing, Noise or Speech) to an audio sample if 90% of the segmented windows of that sample belong to that class. We have set a high threshold to ensure a fewer number of false positives. Figure 13 also shows the percentage of samples which the CNN could not classify. A higher threshold leads to better accuracy, but at the cost of some samples labelled as uncertain. A lower threshold would lead to higher misclassification. For example, the peak detection algorithm would return an inaccurate respiration rate if the CNN labels a speech sample as a tidal sample. A higher threshold lessens misclassification by segregating audio samples that cannot be classified into one class. We achieved a Mean Absolute Error (MAE) of 0.47 on the N95 mask. The MAE on the cloth mask was 0.36. Our results are comparable to previous work on estimating respiration rate using smartphone [36] and WiFi signals [69] . We observed a difference of 1 breathing cycle for some participants. The difference can be attributed to the presence of partial exhalation cycles. It must be noted that the Hilbert Transform envelope for samples incorrectly classified as tidal breathing samples had an amplitude of zero resulting in no peaks, and we thus reject such cycles. In this section, we analyse the robustness of our methods with respect to sensor placement inside the mask. The motivation to perform this experiment is that a person might retrofit the sensor inside any position inside the mask, and SpiroMask will be expected to monitor forced and tidal breathing without any drop in accuracy of lung parameters. We described in Section 3 that we collected forced breathing and tidal breathing samples for each participant by placing the sensor in five different positions inside the mask (Figure 4) . We had sensor position data for 12 participants (including 8 participants with unhealthy lungs). For tidal breathing, we had sensor positioning data for 18 participants (including 8 participants with unhealthy lungs). We used the model trained on audio samples collected by placing the mask at position L1 and predicted on all other samples that belong to locations C1, R1, L3, R3. We used the same cross-validation strategy as described in Section 5. Figure 14 shows the main results on 12 participants (Forced Breathing) and 18 participants (Tidal Breathing). For forced breathing, we observe that SpiroMask is robust to sensor placement where the MPE for all positions is below 7% (the ATS acceptable error). However, for tidal breathing, location L3 and R3 have the lowest MAE. The low MAE on L3 and R3 can be attributed to the microphone being below the apex of the nose. None of the audio samples from L3 or R3 were misclassified during the classification task explained in Section 5. A detailed break-up of MPE and MAE for forced and tidal breathing among healthy and unhealthy participants is given in Figure 19 and Figure 20 in the Appendix. SpiroMask does not analyse speech signals and is concerned only with forceful and tidal breathing signals. This experiment evaluates the robustness to lower sampling rates. According to previous literature [14] , speech intelligibility reduces drastically when the sampling rate is below 8 kHz impeding privacy concerns. Figure 15 shows the classification accuracy as the sampling rate reduces. The classification accuracy remains high (91.7%) even when the sampling rate is reduced to 8KHz as tidal breathing information is present below 8KHz. The decrease in accuracy is attributed to misclassification of noise and speech samples as 'uncertain' samples at a lower sampling rate. As mentioned in Section 4.2, the peaks in the envelope of the tidal breathing signal correspond to exhalations. Figure 16 shows the mean absolute error of peak detection (consequently the respiration rate) calculated on tidal samples increases on decreasing the sampling rate. Spectogram analysis has shown that tidal breathing information is present below the frequency of 4000Hz. Reducing the sampling frequency below 8000Hz leads to an increase in the mean absolute error by a small margin (MAE = 1 at 8kHZ). Therefore for preserving privacy, we can use lower sampling rates at the cost of a slight decrease in the accuracy of the respiration rate. We performed 8-way multivariate analysis of variance (MANOVA) [64] tests to determine if other variables significantly contributed to the difference (in terms of percent error) between SpiroMask and the spirometer in estimating the forced breathing parameters FEV1, FVC and PEF. MANOVA is more suitable than univariate analysis of variance (ANOVA) tests for forced breathing since we have 3 dependent variables. We performed separate MANOVA tests for N95 and cloth masks. We used height, weight, gender, age, whether the subject has performed spirometry tests before, whether the subject reported any lung ailments, whether the subject has the habit of smoking, and whether the subject had a meal before appearing for the test as the 8 grouping variables. The test for cloth mask shows that none of the grouping variables have a significant effect on the percent error between SpiroMask and spirometer measures (all p-values were greater than 0.05, the significance level of the Fig. 15 . Speech becomes indistinguishable below 8 kHz [14] . The classification accuracy drops from 94.7% to 91.7% when sampling rate is decreased from 16kHz to 8kHZ test). The test for N95 mask suggests that height could be a significant variable (p-value ≈ 0.04). Since this is only suggested by the test for N95 mask and not by the test for cloth mask, we need to investigate further to determine how significant the variable height is. We have provided scatter plots showing the variation of percent error for the forced breathing parameters with respect to height (for N95 mask) in Section A.4 of the Appendix. The presence of trend between FVC and FEV1 with height suggests incorporating such information whenever available will further improve our models. We leave the detailed ablation study with participant features like We also performed 7-way analysis of variance (ANOVA) tests to determine if other variables significantly contributed to the difference (in terms of percent error) between SpiroMask and a smartphone strapped to the chest in estimating the tidal breathing parameter (respiration rate). We performed separate ANOVA tests for N95 and cloth masks. We used height, weight, gender, age, whether the subject reported any lung ailments, whether the subject has the habit of smoking, and whether the subject had a meal before appearing for the test as the 7 grouping variables. The tests for both N95 and cloth masks show that none of the grouping variables have a significant effect on the percent error between SpiroMask and smartphone measures (all p-values were greater than 0.05, the significance level of the test). Refer to Section A.4 in the Appendix for more detailed results of the MANOVA and ANOVA tests. We collected feedback through an exit survey with three optional questions: whether they preferred a spirometer or SpiroMask (for measuring forced breathing parameters), whether they preferred SpiroMask or a phone strapped to the chest with a belt (for measuring tidal breathing parameters), and whether they preferred a cloth mask or an N95 mask. Around 58% of the responses preferred SpiroMask over a spirometer, while around 21% of the responses were equally comfortable with both methods. The main reasons cited by the responses preferring SpiroMask over a spirometer included: "mask was more comfortable", "forced exhalation was easier to perform in a mask", and "spirometer was hard to hold and operate, it is bulky". Some of the older aged subjects commented that "it becomes hard to hold the spirometer in the mouth for a long time" (subject no. 16) , and "mask is preferable as spirometer required breathing using mouthpiece which was a bit difficult" (subject no. 7). Interestingly, 75% preferred a cloth mask over an N95 mask, mainly based on comfort. As mentioned in Section 4, we used the accelerometer sensor of a smartphone to collect ground truth on respiration rate. But, we could attain the groud truth via accelerometer only for a few participants. For the rest, we relied on self-counting as the ground truth. Among those who wore the smartphone strapped belt, 91% preferred SpiroMask over the phone to monitor tidal breathing during the user study. A participant said, " the chest belt seemed to restrict breathing, " and others voiced similar opinions. When asked by the investigator if they would like to wear a chest belt or Spiromask as a part of their daily life, the opinion unanimously favoured Spiromask. We now discuss some limitations of our work and proposed future work to address them • Human motions hinder breathing measurement. Prior research has shown that sensors worn in the chest can be used to measure breathing parameters during activities such as walking [65] . But, wearing a chest belt is not always comfortable. We are currently investigating measuring respiration rate via SpiroMask when the person is ambulatory. Our initial results show that microphones can detect breathing signals even when the person is walking. • Our confounding variable analysis suggests that incorporating personal parameters such as height can help improve the estimation for forced breathing parameters. In the future, we plan to conduct an ablation study to study the potential improvements in modelling when personal health parameters are available. Previous smartphone spirometry work has shown improvement via personalisation [25] , though their notion of personalisation was different. • A detailed large scale user study on the usability of SpiroMask is currently out of the scope of our current research. In our current work, we evaluated the percentage of participants who would prefer SpiroMask over a traditional spirometer or a chest belt. • Currently, SpiroMask could not detect inhalations due to low amplitude. This is a known problem in smartphone spirometry [25] . However, unlike the smartphone, since we use a custom microphone, we plan to experiment with multi-microphone setup, one with high gain for inhalation and the other for exhalation. • In our current work, we do not estimate the tidal volume. This is primarily due to lack of clinical-grade ground truth device for tidal volume estimation. We believe that our existing pipeline for forced volume should be easily adaptable to estimate tidal volume. • In the current work, we predict specific points on the flow-volume curve (such as FEV1, FVC). In the future, we plan to predict the whole flow-volume curve instead of only these points. This problem can be naturally mapped to a sequence-to-sequence learning problem [50] . We plan to leverage recent advances in neural networks based sequence to sequence methods for this task [31, 71] . • Our current prototype requires external power. Recent advances in the community have leveraged triboelectric nanogenerator to develop self powered acoustic sensor [3] . In the future, we would like to explore these recent advances towards making the system self-powered. • Our work shows that lung health can be diagnosed by monitoring forceful breathing or tidal breathing via a microphone placed inside a consumer-grade N95 or cloth mask. We optimised the Convolutions Neural Network (CNN) classifier described in Section 4.2 to work in real-time in the Arduino Nano 33 BLE Sense 10 microcontroller. We used offline algorithms to deduce forced and tidal breathing parameters. We envision that a SpiroMask will feed the lung health parameters on a wearer's smartphone over Bluetooth Low Energy (BLE). In this paper, we presented a system for performing spirometry and continuous respiration rate monitoring using consumer-grade N95 and cloth masks. Forced and tidal breathing are used for deriving lung health bio-markers like estimating the respiration rate and volume of exhaled air. We showed that a retrofit sensor placed inside an N95 or a two-layered cloth mask can estimate forced and tidal breathing. Our evaluation of over 48 participants of forced breathing implies that the accuracy of wearable spirometry is well within the clinically accepted range for participants with and without lung ailments. Moreover, our work is comparable to existing research on portable spirometry and requires less complex modelling, making it possible to deploy it on microcontrollers running machine learning models. Our subjective evaluation in our study population shows acceptability and ease of use compared to traditional spirometry. In this supplementary document, we provide additional details for our paper. The list of features are described in Table 4 A.3 Ablation Studies Tables 8 and 9 show the results obtained in the 8-way MANOVA and 7-way ANOVA tests respectively. MANOVA was performed for forced breathing parameters and ANOVA was performed for the tidal breathing parameter (respiration rate). Both MANOVA and ANOVA were performed for cloth and N95 masks. The 8 grouping variables used in the MANOVA tests were height, weight, gender, age, whether the subject has performed spirometry tests before, whether the subject reported any lung ailments, whether the subject has a habit of smoking, and whether the subject had a meal before appearing for the experiment. The 7 grouping variables used in the ANOVA tests were height, weight, gender, age, whether the subject reported any lung ailments, whether the subject has a habit of smoking, and whether the subject had a meal before appearing for the experiment. Figure 18 presents the variation of percent error for the forced breathing parameters with respect to height (the grouping variable with the lowest p-value in the MANOVA test) for N95 mask. Whether the subject reported any lung ailment 0.4152 0.8152 Table 9 . Results of the ANOVA test for both cloth and N95 masks are shown above. None of the grouping variables have a significant effect on the percent error between SpiroMask and smartphone measures (all p-values are greater than 0.05, the significance level of the test). Naqaab: towards health sensing and persuasion via masks Wearable technology: role in respiratory health and disease SATURN: A thin and flexible self-powered microphone leveraging triboelectric nanogenerator TSFEL: Time series feature extraction library Statistical methods for assessing agreement between two methods of clinical measurement COPD's early origins in low-and-middle income countries: what are the implications of a false start Korosh Vatanparvar, and Jilong Kuang. 2019. mLung++ automated characterization of abnormal lung sounds in pulmonary patients using multimodal mobile sensors BreathPrint: Breathing acoustics-based user authentication Respiration rate and volume measurements using wearable strain sensors Recommendations for a standardized pulmonary function report. An official American Thoracic Society technical statement RespWatch: Robust Measurement of Respiratory Rate on Smartwatches with Photoplethysmography Respiratory rate predicts cardiopulmonary arrest for internal medicine inpatients Factors governing the intelligibility of speech sounds Spirocall: Measuring lung function over a phone call Standardization of spirometry 2019 update. An official The identification of risk factors for cardiac arrest and formulation of activation criteria to alert a medical emergency team An introduction to statistical learning The hilbert transform. Mathematics Master's Thesis The impact of the COVID-19 pandemic on sleep medicine practices Onur Avci, and Moncef Gabbouj. 2019. 1-d convolutional neural networks for signal processing applications The maximal expiratory flow-volume curve: normal standards, variability, and effects of age Addressing reduced laboratory-based pulmonary function testing during a pandemic Machine Learning for Sleep Apnea Detection with Unattended Sleep Monitoring at Home SpiroSmart: using a microphone to measure lung function on a mobile phone Accurate and privacy preserving cough sensing using a low-cost microphone Automatic characterization of user errors in spirometry Mel filter bank energy-based slope feature and its application to speaker recognition Challenges in realizing smartphone-based health sensing Standardisation of spirometry Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation Linear predictive coding The use of home spirometry in detecting acute lung rejection and infection following heart-lung transplantation A comparison of the energy operator and the Hilbert transform approach to signal and speech demodulation BreathEasy: Assessing Respiratory Diseases Using Mobile Multimodal Sensors Towards reliable data collection and annotation to extract pulmonary digital biomarkers using mobile sensors BodyBeat: a mobile system for sensing non-speech body sounds Lung disease in a global context. A call for public health action The INTERSPEECH 2010 paralinguistic challenge Getting synchronized with the metronome: Comparisons between phase and period correction Frequency spectra of normal expiratory nasal sound Real-time heart rate variability extraction using the Kaiser window Introduction to digital signal: Processing and filter design Breeze: Smartphone-based acoustic real-time detection of breathing phases for a gamified biofeedback breathing training Mobile devices and health SpiroSonic: monitoring human lung function via acoustic sensing on commodity smartphones Screening for and early detection of chronic obstructive pulmonary disease Quadratic correlation time delay estimation algorithm based on kaiser window and hilbert transform Sequence to sequence learning with neural networks Novel phase encoded mel filterbank energies for environmental sound classification Estimation of inhalation flow profile using audio-based methods to assess inhaler medication adherence ISO 14971-Medical Device Risk Management Standard High-resolution time-frequency spectrum-based lung function test from a smartphone microphone Continuous remote monitoring of COPD patients-justification and explanation of the requirements and a survey of the available technologies Spirometry in the occupational health setting-2011 update Tidal breathing analysis as a measure of airway obstruction in children three years of age and older Early detection of chronic obstructive pulmonary disease (COPD): the role of spirometry as a diagnostic tool in primary care SciPy 1.0: fundamental algorithms for scientific computing in Python SpiroConfidence: determining the validity of smartphone based spirometry using machine learning Clinical methods: the history, physical, and laboratory examinations Smartphone Sonar-Based Contact-Free Respiration Rate Monitoring A primer on multivariate analysis of variance (MANOVA) for behavioral scientists A-spiro: Towards continuous respiration monitoring Analog and digital filter design Listen2Cough: Leveraging End-to-End Deep Learning Cough Detection Model to Enhance Lung Health Assessment Using Passively Sensed Audio Acoustic airflow estimation from tracheal sound power Extracting multi-person respiration from entangled rf signals Understanding and improving recurrent networks for human activity recognition by continuous attention Sequence-to-point learning with neural networks for non-intrusive load monitoring Accurate Spirometry with Integrated Barometric Sensors in Face-Worn Garments MobSpiro: Mobile based spirometry for detecting COPD