key: cord-1017099-rnwrsj1g authors: Georgiou, Georgios P. title: Acoustic markers of vowels produced with different types of face masks date: 2022-03-30 journal: Appl Acoust DOI: 10.1016/j.apacoust.2022.108691 sha: 12ac0f66a607d18aa0872bb23e0c81fbbe39e4a5 doc_id: 1017099 cord_uid: rnwrsj1g The wide spread of SARS-CoV-2 led to the extensive use of face masks in public places. Although masks offer significant protection from infectious droplets, they also impact verbal communication by altering speech signal. The present study examines how two types of face masks affect the speech properties of vowels. Twenty speakers were recorded producing their native vowels in a /pVs/ context, maintaining a normal speaking rate. Speakers were asked to produce the vowels in three conditions: (a) with a surgical mask, (b) with a cotton mask, and (c) without a mask. The speakers’ output was analyzed through Praat speech acoustics software. We fitted three linear mixed-effects models to investigate the mask-wearing effects on the first formant (F1), second formant (F2), and duration of vowels. The results demonstrated that F1 and duration of vowels remained intact in the masked conditions compared to the unmasked condition, while F2 was altered for three out of five vowels (/e a u/) in the surgical mask and two out of five vowels (/e a/) in the cotton mask. So, both types of masks altered to some extent speech signal and they mostly affected the same vowel qualities. It is concluded that some acoustic properties are more sensitive than other to speech signal modification when speech is filtered through masks, while various sounds are affected in a different way. The findings may have significant implications for second/foreign language instructors who teach pronunciation and for speech therapists who teach sounds to individuals with language disorders. During the COVID-19 outbreak, face masks have become mustuse equipment for billions of people in an attempt for the world nations to effectively restrain the spread of the disease [30] . The role of face masks, and especially certain types of masks such as surgical and N95, has been proven important for the reduction of virus transmission [8, 26, 31] since they prevent the emission of infected particles and droplets from expiratory activities such as breathing, coughing, sneezing, and speaking [21, 15] . Although face masks offer protection from virus infection, they negatively affect verbal communication and consequently intelligibility [3, 10, 22, 27] . Some studies have shown that speech signal is modified through masks [6, 10, 13, 34, 35] , decreasing the transmission of speech by an estimated 3-4% [32] . Also, when a speaker wears a face mask, the listener cannot follow the articulatory gestures of the speaker (e.g, lips, jaw movements), which are equally important as auditory cues for the understanding of speech [18] ; among other, this might negatively impact the communication of individuals with hearing loss or communication disorders [7] Face masks can be categorized into three main types: respirators, medical masks, and cloth masks (Nguyen et al., 2021). Various mask types offer different levels of protection and they affect speech signal to a different extent depending on their fabric, shape, and fit. Corey et al. [10] investigated how different types of face masks attenuate speech signal. Specifically, they tested surgical masks, N95 and KN95 respirators, six cloth masks made out of different fabrics, two cloth masks with transparent windows, and a plastic shield. They found that frequencies of more than 1 kHz were attenuated through all mask types. With respect to the effect of mask type on speech signal, surgical masks offered the best acoustic performance followed by loosely woven cotton masks. Masks with transparent windows and plastic shields offered worse acoustic performance than medical and cloth masks. In general, there was a variation in the quality of speech signal among cloth masks made of different materials. Maryn et al. [29] assessed how the sound properties of speech and noise are affected by three different types of respiratory protective masks (disposable surgical mask, FFP2 mask, and transparent plastic mask). They investigated sound properties that corresponded to voice production, voice quality, articulation, and resonance. The largest acoustic Applied Acoustics j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / a p a c o u s t differences between masked and unmasked conditions were observed for the transparent plastic mask, while the smallest acoustic differences in the two conditions were observed for the surgical mask. Evidence about the high quality of speech signal produced with a surgical mask is provided by Maryn [28] . The author examined whether the use of surgical masks affects various speech markers such as sound intensity level, fundamental frequency, jitter local, shimmer local dB, smoothed cepstral peak prominence, and Acoustic Voice Quality Index. For this purpose, she recorded individuals with and without surgical masks. The author found no significant differences between the masked and the unmasked conditions and hence surgical masks can transmit high-quality speech signal. Similarly, Fiorella et al. [14] examined the effect of surgical mask on vocal aspects (i.e., F0, vocal intensity, jitter, shimmer and harmonics-to-noise ratio) to assess how voice and verbal communication are affected. All subjects voiced a vocal sample with and without a surgical mask. No significant differences were found among the masked-unmasked conditions for any of the acoustic markers, while a non-significant reduction of vocal intensity was observed in the majority of the subjects. Few studies have studied the effect of mask-wearing on individual sounds or specific types of sounds. Fecher & Watt [13] tested how various types of face-covering impact the properties of fricatives. Ten speakers were asked to produce 18 English consonants. The speech material that was analyzed included the voiceless sibilants /s ʃ/ and non-sibilants /f h/. The results revealed that specific masks attenuated some significant speech properties of sibilants rendering the distinction between sibilants and non-sibilants challenging. Saigusa [35] studied the effect of face-covering (motorcycle helmet, balaclava, and plastic mask) on the acoustic features of fricatives /f h v/. The author concluded that there was a significant effect of facewear on the sounds' intensity as well as the centre of gravity, standard deviation, skewness, and kurtosis. This would make /h/ and /f/ undistinguishable, having significant implications for forensic studies. The effect of face-covering (helmet, mask, niqā b) on the acoustic properties of vowels (F1, F2, duration) was examined by Abbasi et al. [1] . The results indicated that all types of face-covering had a significant effect on the formants of /ə/, while mask did not affect any speech feature of /a:/. This study analyzed the speech properties of two Pahari central vowels only, that is /ə/ and /a:/, instead of the full range of vowels found in the phonological inventory of the language. Georgiou [17] tested whether face masks exercised any effect on the spectral and qualitative characteristics of the Greek vowels /i e a o u/. The findings showed a few differences between the masked and the unmasked conditions which mostly affected the F2 of some vowels. The effect of mask type was not considered. Joshi et al. [23] investigated the effect of different types of face masks on voice measures such as F0, F1, and F2 for vowels /a/ and /i/ and smoothed cepstral peak prominence (CPPs) for /a/. The results indicated that there were no significant differences for F0 and F1, but some different effects were found for the F2 of /a/ between males and females. For example, there was a drop in F2 for males wearing KSF masks and for females wearing cloth masks. Also, there was an increase in F2 for females wearing surgical masks. Finally, Nguyen et al. (2021) compared measures of voice markers in 16 adults who produced vocal tasks with a surgical mask and a KN95 mask. The findings demonstrated that there were no significant differences for vowel mean spectral levels and intensity between the mask-wearing and the no mask conditions. The measurements of this study were confined to the production of vowel /a/. Research on the acoustics of particular speech sounds produced with a mask is scarce. We need to know more about how basic segment acoustic markers are affected by the use of face masks. This will allow us to know which sounds are mostly affected and which acoustic features of sounds are more vulnerable to alteration. The consideration of different mask types is also important for clinical decisions as they should be chosen masks that offer optimal verbal communication. The aim of this study is to investigate the effect of mask type on the acoustic properties of vowels. Specifically, it will examine the effect of surgical and cotton face masks on the spectral (first and second formants) and quantitative (duration) features of native vowels. We selected these types of masks as they are two of the most widely used masks during the COVID-19 pandemic. Also, we focused on the investigation of vowels since previous studies (e.g., [4] ) suggested that vowel production accuracy correlates with intelligibility, while consonant production accuracy does not. This study differs from other similar studies in that it focuses on the effect of mask type on individual sounds rather than on general speech patterns. While there is some previous evidence about the effect of face-covering on individual sound production, this evidence either is mostly concerned with consonants rather than vowels or considers a limited range of vowels. Also, this study takes into consideration the effect on the first formant (F1), the second formant (F2), and duration of vowels, which are the minimum basic features that distinguish vowel categories; earlier studies mostly concentrated on other acoustic features of sounds such as intensity and fundamental frequency. The study's protocol was based on a speech production experiment in which participants were asked to produce native vowels without, with surgical, and with cotton face masks. The output was analyzed through speech analysis software to determine the extent of speech signal alteration in the three conditions for every individual vowel. The same protocol was used in Georgiou [17] , but the effect of mask type was not considered. The results of the study offer important implications for second/foreign language instructors and health care practitioners such as speech therapists. Twenty speakers of Cypriot Greek (n males = 12, n females = 8) participated in the study. Their age range was 20 -55 years (M age = 33.1, SD = 9.36), and they were born and raised in Cyprus. All of them originated from moderate-income families. The participants did not report any visual, auditory, or language deficits. Prior to their participation in the speech experiments, they were informed about the study goals and signed a consent form. The stimuli consisted of the 5 Cypriot Greek vowels /i e a o u/. The vowels were embedded in monosyllabic nonsense words with the structure of /pVs/. The participants produced the words in the carrier phrase ''to < keyword> íne kaló" (''the < keyword > is good") as speaking to a friend. The test was completed in a sound-attenuated room and participants were tested individually. The researcher advised participants to appropriately sit in front of a PC monitor, maintaining a consistent distance from it. The carrier phrases were presented through a Microsoft PowerPoint sheet and participants were instructed to pronounce them as speaking to a friend, maintaining a normal pace. The phrases were presented in a random order in 4 repetitions with an optional two-minute break at the midpoint. In total, the participants produced 20 randomized utterances each (5 vowels  4 repetitions). The output was recorded through Zoom H4n audio recorder at a 44.1 kHz sampling rate and saved as .wav files with a resolution of 24 bits. The distance between the participants' mouth and the recorder was approximately 30 cm and it was uniform amongst speakers. The target vowels, which were part of /pVs/ words, were isolated and segmented via Praat [5] . The participants had completed the test three times in three different conditions: without a mask, with a surgical mask, and with a cotton mask. The masks were completely covering the mouth and the nose of the participants and exactly fitted their face. The characteristics of the masks are shown in Table 1 . The output was sent to Praat for speech analysis. The determination of the target vowels' offsets and onsets have been decided on the basis of spectrograms and the waveforms. We provided measures on F1, F2, and duration. Formants comprise the acoustic resonances of the human vocal tract. F1 corresponds to vowel height, that is, how high is the tongue in the mouth (high vowels -mid vowels -low vowels), and F2 corresponds to vowel position, that is, how back is the tongue in the mouth (front vowels -central vowels -back vowels). Fig. 1 shows the spectrogram and waveform of vowel /i/. For the generation of all tracks, the length of windows was 0.025 m.s., the pre-emphasis 50 Hz, and the spectrogram view range 5500 Hz. For the measurement of spectral features, the initial point of the vowel's acoustic analysis was specified as the end of the noise of the first consonant /p/ and the onset of the vowel (V). The last point of the vowel's acoustic analysis was specified as the end of the vowel (V) and the beginning of the noise of the second consonant /s/. The target vowels' boundaries have been manually set by the researcher. Vowel formants were measured at their midpoint. Vowel duration was calculated as the distance between vowel onset and vowel offset. A Praat script was used to automatically extract the participants' measurements (F1, F2, duration), which were ultimately saved on a Microsoft Excel file. For our analysis, we used mixed-effects models from the lmerTest package [24] in R [33] . The dependent variables were F1, F2, and vowel duration. The productions were normalized for peak intensity in Praat. To mitigate any gender effects, formants were normalized with the NORM software package [37] . The final model included vowel (5 Greek vowels), condition (no mask/surgical mask/cotton mask), and vowel  condition as fixed slopes, while speaker was modeled as a random slope. The pairwise comparisons have been conducted through the emmeans package [25] from R. The total number of the analyzed productions was 1200 (i.e., 20 speakers  3 conditions  5 vowels  4 repetitions). We fitted three linear mixed-effects models to investigate the effect of face mask type on F1, F2, and duration of vowels as produced by native speakers. Table 2 shows the results of the analysis. As it is illustrated in Fig. 2 , vowels /i o/ seem to spectrally overlap in all conditions, while /a/ and /u/ seem to spectrally overlap in the surgical-cotton mask condition and the no mask-cotton mask con-dition respectively. By contrast, vowel /e/ does not occupy the same position in the vowel space in the three conditions (particularly regarding F2), vowel /a/ in the no mask condition is more peripheral compared to the other two conditions, and vowel /u/ in the surgical mask condition is more peripheral than the other two conditions. Table 3 . The . So, only F2 was affected by mask wearing. Specifically, the F2 of two vowels was altered when speakers produced them with a cotton mask and the F2 of three vowels was altered when speakers produced them with a surgical mask. The duration of vowels was also examined in the three conditions. Fig. 3 illustrates the duration of each vowel as produced by Table 2 Results of the linear mixed-effects models for the effects of vowel, condition and vowel  condition on F1, F2, and Duration. The table shows the estimate for the intercept and the coefficients of the model, the standard errors (SE), the degrees of freedom (df), the t-value, and the p-value (Pr(>|t|)); The intercept is vowel /a / in cotton mask condition the native speakers in all conditions. Some initial differences can be spotted. First, most of the vowels produced without a mask had shorter durations than those produced in the masked conditions. The duration of the vowels produced with a surgical mask was longer than those produced with a cotton mask. The analysis demonstrated that there was a significant effect of vowels on duration for /i/ (b = À42.950, SE = 6.081, t = À7.06, p < 0.001), /e/ (b = À15.600, SE = 6.081, t = À2.57, p < 0.05), and / u/ (b = À39.550, SE = 6.081, t = À6.50, p < 0.001). Also, there was a significant effect of surgical mask on duration (b = 15.100, SE = 6.081, t = 2.48, p < 0.05). The pairwise comparisons with the Tukey post-hoc test revealed that there were not any significant differences between corresponding vowels in the three conditions apart from the duration of /e/ in the no mask -surgical mask conditions (b = À25.80, SE = 6.08, t = À4.24, p = 0.028, d = À1.34). So, duration was not importantly altered in the masked vs nonmasked conditions. To examine whether vowels can be distinguished using facemasks, we compared the vowels' acoustic features (F1, F2, and Duration) produced with surgical and cotton masks with those produced without a mask. Vowel signal alteration occurs when at least one of each vowel's acoustic features in the two masked conditions significantly differs from the corresponding feature in the no mask condition. As seen in Table 3 , F2 and duration of vowel / e/ in surgical mask and F2 of the same vowel in cotton mask differ from those in the no mask condition. F2 of vowel /a/ also differs between surgical/cotton and no mask conditions, while F2 of vowel /u/ in surgical mask differs from that in the no mask condition. Thus, the signal of vowels /e a u/ is altered when produced with a surgical mask, and the signal of vowels /e a/ is altered when produced with a cotton mask (see Table 3 ). The present study examined the speech properties of vowels as produced by their native speakers with and without face masks. Specifically, 20 Cypriot Greek speakers were asked to produce the 5 Cypriot Greek vowels in carrier phrases with cotton and surgical masks and without a face mask. Their productions were analyzed through speech analysis software. F1, F2, and duration of the vowels in the three conditions were extracted automatically from a Praat script. The findings of this study demonstrated that face masks modify to some extent speech signal since some speech properties of vowels produced with face masks differed significantly from the properties of vowels produced without a face mask. This corroborates earlier findings that observed signal modification in speech produced through masks (e.g., [2, 13, 22, 35] ). Fecher [12] argued that there are two main factors that determine the alteration of speech signal produced with face mask-wearing. First, two important phonetic theories, the source-filter theory of speech production developed by Fant [11] and the quantal theory of speech [36] support that even a minor modification of articulatory gestures can result in the production of sounds that are acoustically different from those that are typically produced. Speakers may try to reposition their articulators when speaking through face masks in order to produce more clear speech. This may have a significant effect on the speech properties of the produced sounds. Second, depending on the material of the face mask, an amount of energy may be absorbed by the mask which usually completely fits the face. Finally, when air particles hit the material of the mask, additional turbulences may be produced, affecting in that way the quality of speech signal. Thus, indeed face masks alter speech signal at least to some extent and what it really comes out is not what was actually produced. The findings portrayed that not every vowel under investigation was affected in the same way by the use of face masks. Similarly, Cohn et al. [9] pointed out that face masks impact intelligibility to a larger degree depending on the type of articulation. For instance, face masks can decrease the aspiration of voiceless plosives or hinder the full lip projection of labials. In our study, vowels /i o/ remained unaffected with the use of any type of mask, whereas the properties of vowels /e a u/ were altered in the masked conditions. An outstanding finding that emerged from the present study is that F1 and duration (except in the case of / e/ in the surgical mask) aspects of vowels did not differ between the masked and no masked conditions, while F2 differed as some vowels were more or less peripheral in the masked conditions. In other words, face masks mostly impact the front/back movement of the tongue rather than its low/high position in the mouth. This agrees to some extent with the findings of Joshi et al. [23] who observed a transmission loss for F2 but not for F1 (and F0) with the use of face masks. So, it is suggested that speech signal alteration might be sound-specific and some acoustic properties might be more sensitive to changes compared to other when face masks are used. One of the main aims of this study was to investigate the effect of mask type on vowel properties. For this purpose, two face mask types were used: a surgical and a cotton mask. Both mask types affected the same vowels (except /u/) and the same properties of vowels. For example, vowel /a/ did not differ in the surgical -cotton mask conditions but differed in the surgical -no mask and the cotton -no mask conditions. However, this was not the case for vowel /e/ which differed both in the surgical -no mask and the cotton -no mask conditions and the surgical -cotton mask conditions. This practically means that although speech signal is altered through masks affecting the same vowels, the speech signal of some vowels may be affected in a different way. This is interpreted as a result of the material of the mask which acts as a filter of the sound properties. The literature suggests that surgical masks alter the quality of speech signal to a lesser extent than other types of masks (e.g., [10,29;38] ), and that cotton jersey (generic) masks offer a good acoustic performance that is comparable to that of surgical masks [10] . The results of this study are partially consistent with previous findings as both the surgical and the cotton masks affected only one acoustic feature of vowels and for some vowels only, yielding that other important acoustic features such as F1 and duration remain intact. Although masks affected only one vowel property, we believe that this can create challenges in pronunciation teaching where listeners have to receive explicit qualitative speech input. The results can inform second/foreign language instructors who teach pronunciation during the pandemic. It was found that the acoustic properties of certain vowels were affected by face maskwearing. In particular, vowels /e a u/ in the surgical mask and vowels /e a/ in the cotton mask were produced with different articulatory properties compared to the no mask condition; therefore, the speech signal of these vowels was altered. So, considering that Arabic learners of Greek often cannot accurately perceive and produce the Greek vowel /e/ due to the effect of their first language (see [16, 19, 20] ), instructors should be careful when they teach this vowel as face mask-wearing alters its speech properties. They should seek alternative ways to teach sounds. For instance, the use of masks may be avoided when much distance is kept from the students, or computer-based speech software can be used for the teaching of sounds. The use of lapel microphones is another good practice as most masks have little effect on them [10] . The results can also offer important implications for health care practitioners. For example, the teaching of native sounds to individuals with language disorders can be challenging for speech therapists. They should consider that speech signal cannot be accurate with the use of masks and thus they should find the best possible way to teach problematic speech sounds. Effects of face mask-wearing on the acoustic properties of speech sounds were revealed. One limitation of this study is that gender was not considered in the acoustic analysis. In addition, considering that some vowels were affected in a different way in the three conditions, it remains to more precisely investigate why some vowels are affected more by face mask-wearing in comparison to other vowels. Future studies should also take into consideration the effect of face mask-wearing on other acoustic properties of vowels such as F3 and fundamental frequency. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Effects of forensically relevant face coverings on acoustic properties of Pahari central vowels More speech degradations and considerations in the search for transparent face coverings during the COVID-19 pandemic The effect of conventional and transparent surgical masks on speech understanding in individuals with and without hearing loss Segmental errors in different word positions and their effects on intelligibility of non-native speech: All's well that begins well Praat: doing phonetics by computer Acoustic-phonetic characteristics of speech produced in noise and while wearing an oxygen mask Face masks can be devastating for people with hearing loss Face mask use in the community for reducing the spread of COVID-19: a systematic review Intelligibility of face-masked speech depends on speaking style: comparing casual, clear, and emotional speech Acoustic effects of medical, cloth, and transparent face masks on speech signals Acoustic theory of speech production. The Hague: Mouton Publishers Effects of forensically-relevant facial concealment on acoustic and perceptual properties of consonants Doctoral dissertation Speaking under cover: the effect of face-concealing garments on spectral properties of fricatives Voice differences when wearing and not wearing a surgical mask The use of aspirated consonants during speech may increase the transmission of COVID-19 Discrimination of L2 Greek vowel contrasts: evidence from learners with Arabic L1 background The effect of face mask on the acoustic properties of vowels Speech perception in visually impaired individuals might be diminished as a consequence of monomodal cue acquisition Effects of phonetic training on the discrimination of second language sounds by learners with naturalistic access to the second language Vowel learning in diglossic settings: evidence from Arabic-Greek learners How the language we speak determines the transmission of COVID-19 How do medical masks degrade speech perception COVID-19: acoustic measures of voice in individuals wearing different facemasks Estimated Marginal Means, aka Least-Squares Means Respiratory virus shedding in exhaled breath and efficacy of face masks Effects of face masks on acoustic analysis and speech perception: implications for peripandemic protocols Initial Exploration at Home: Acoustic Voice Markers With and Without Disposable Surgical Face Mask Are acoustic markers of voice and speech signals affected by nose-and-mouth-covering respiratory protective masks? J Voice The use of face masks during the COVID-19 pandemic in Poland: a survey study of 2315 young adults Comparing the fit of N95, KN95, surgical, and cloth face masks and assessing the accuracy of fit checking Speech intelligibility assessment of protective facemasks and air-purifying respirators R: A language and environment for statistical computing. R Foundation for Statistical Computing Analysis of face mask effect on speaker recognition The effects of forensically relevant face coverings on the acoustic properties of fricatives Quantal theory, enhancement and overlap NORM: The Vowel Normalization and Plotting Suite Effects of face masks on speech recognition in multi-talker babble noise We would like to thank all individuals for their participation in the experiments. This paper has been supported by the RUDN University Strategic Academic Leadership Program.