LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAIGN 
 
 Cop ^ 
 
. (( A UIUCDCS-R-T1- 1 +T9 
 
 f7f 
 
 )%cCaj 
 
 C00-2118-002U 
 
 A COMPARATIVE STUDY OF SOME VISUAL 
 SPEECH DISPLAYS 
 
 BY 
 Bernard J. Nor&mann Jr. 
 
 September 10, 1971 
 
 1& 
 
 ***** 
 
 0^ V *\ 
 
 \& 
 
 \<S\\ 
 
 
 V 
 
 o*- 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/comparativestudy479nord 
 
ERRATA 'TO 
 
 UIUCDCS-R-71-U79 COO-2118-0021+ 
 
 A COMPARATIVE STUDY OF SOME VISUAL 
 SPEECH DISPLAYS 
 
 BY 
 
 Bernard J . Nordmann Jr . 
 
 September 10, 1971 
 
 Title Page Footnote should read: 
 
 This work was supported in part by the Atomic Energy 
 Commission under Contract AT(ll-l)-10l8 and AT(ll-l)-21l8 
 held by the Digital Computer Laboratory, University of 
 Illinois, and in part by the Joint Services Electronics 
 Program (U.S. Army, U.S. Navy, and U.S. Air Force) under 
 Contract DAAB-07-67-C-0199 , held by the Coordinated 
 Science Laboratory, University of Illinois. 
 
COO-2118-002U 
 
 UIUCDCS-R-71-U79 
 
 A COMPARATIVE STUDY OF SOME VISUAL 
 SPEECH DISPLAYS* 
 
 BY 
 Bernard J. Nordmann Jr. 
 
 September 10, 1971 
 
 Department of Computer Science 
 University of Illinois 
 Urbana, Illinois 6l801 
 
 * This work was supported by Contract AT(ll-l)-10l8 with the 
 U.S. Atomic Energy Commission through September 30, 1970. 
 Current support is under Contract AT ( 11-1 ) -2118 with the above 
 agency. 
 
A COMPARATIVE STUDY OF SOME VISUAL 
 SPEECH DISPLAYS 
 
 BY 
 
 BERNARD JOSEPH NORDMANN JR. 
 S.B. , Massachusetts Institute of Technology, 1966 
 S.M. , Massachusetts Institute of Technology, 1966 
 
 THESIS 
 
 Submitted in partial fulfillment of the requirements 
 
 for the degree of Doctor of Philosophy in Computer Science 
 
 in the Graduate College of the 
 
 University of Illinois at Urbana-Champaign, 1971 
 
 Urbana, Illinois 
 
Ill 
 
 ACKNOWLEDGMENT 
 
 The author would like to express his sincere gratitude to Pro- 
 fessor Sylvian R. Ray and Professor H. Gene Slottow for their advice and 
 encouragement with this thesis. 
 
 Thanks are also due to Mrs. Judy Arter and Mrs. Roberta Andre' 
 for typing and correcting the manuscript and for their continuous requests 
 for manuscript to type which was a major factor in the completion of this 
 thesis before the deadline. Last hut not least, thanks are due to Mr. 
 Stan Zundo and Tom Gedgaudas for preparing the figures and tables used in 
 this thesis. 
 
IV 
 
 A COMPARATIVE STUDY OF SOME VISUAL 
 SPEECH DISPLAYS 
 
 Bernard Joseph Nordmann Jr. , Ph>.D. 
 Department of Computer Science 
 University of Illinois at Urbana-Champaign, 1971 
 
 The purpose of the present project was to develop a computer 
 speech display simulation system capable of generating a wide variety of 
 speech displays from a recorded speech input. Eventually it is hoped that 
 this will lead to a system whereby a person can obtain visual feedback as 
 a corrective measure for word pronunciation. The basic system would 
 involve two displays, one representing the subject's pronunciation of a 
 particular word and the other representing a correct pronunciation of the 
 word. A computer would be used to process the incoming speech and produce 
 a display containing features highly relevant to correct pronunciation. 
 The subject's task would be to detect differences in the two displays and 
 to change his pronunciation so as to make them more similar. 
 
 After conducting an extensive literature search to determine the 
 types of schemes which had previously been used to display speech sounds, 
 a basic interactive display system was programmed using the CSL's CDC l6oU 
 computer-graphics facility. The system has been designed to be open-ended 
 and currently can produce photographs of a variety of display types. Unfor- 
 tunately, the system as it stands now cannot operate in real time due to the 
 slowness of the CDC 160U. 
 
 The simulation system was used to produce examples of several 
 different types of displays. These displays were used in a series of pre- 
 liminary tests designed to develop techniques for comparing the effective- 
 ness of various types of displays. Several corrections and refinements to 
 the testing methods are discussed. 
 
TABLE OF CONTENTS 
 
 Chapter Pa S e 
 
 1. INTRODUCTION 1 
 
 ,2'J CHARACTERISTICS - OF- SPEECH 5 
 
 2.1 Problems in Speech. Analysis 
 
 2.2 Significant Parameters of Speech ' 
 
 3 . HISTORY OF SPEECH DISPLAYS 12 
 
 12 
 
 3 . 1 Early Displays 
 
 1 "3 
 
 3.2 Spectrographic Displays J 
 
 3.3 Spectrographic Variations -* 
 
 3.k Other Linear Time Displays 
 
 17 
 
 3.5 Two-Dimensional X-Y Displays ' 
 
 3.6 Zero-Crossing Displays 
 
 op 
 
 3 . 7 Pitch Extracting Displays " 
 
 3.8 Miscellaneous Formats 3 
 
 3.9 The Use of Speech Displays 2 ^ 
 
 k. PROPOSED STUDY 29 
 
 k . 1 Outline of the Study 30 
 
 k.2 Theoretical Significance of the Comparison Tests 32 
 
 5 . DISPLAY DESCRIPTIONS 36 
 
 5 . 1 Variable-Intensity TV Scan Display 36 
 
 5 . 2 Continuous Line Display , 37 
 
 5 . 3 Spectrogram 37 
 
 5 • h Forment Extracting Display • • • * ^2 
 
 5 . 5 Zero-Crossing Display ^6 
 
 5 • 6 Zero-Crossing vs Amplitude Envelope ^" 
 
VI 
 
 Page 
 
 6. SPEECH DISPLAY SIMULATION STSTEM 52 
 
 6.1 The Common Data Base 52 
 
 6.2 The Command Processor 55 
 
 6.3 The Speech Display Routines 57 
 
 6.h The Subprocessing Routines 59 
 
 6. 5 Basic System Principles 60 
 
 7. RESULTS 62 
 
 7 .1 Recordings 63 
 
 7.2 Data from the First Test 65 
 
 7 • 3 Data from the Second Test 8 5 
 
 8. SUMMARY AND CONCLUSIONS 93 
 
 8.1 Comments on the Tests Which Were Performed 93 
 
 8.2 Comments on the General Method 9^ 
 
 8.3 Summary 95 
 
 REFERENCES 97 
 
 VITA 106 
 
Vll 
 
 LIST OF FIGURES 
 
 Figures Page 
 
 1 Effect of Variations in High Frequency Emphasis and 
 Intensity Truncation Using the Word "Shod" 39 
 
 2 Effect of Variations in Time Slice Size ^0 
 
 3 Examples of the Spectrographic Display with Nominal 
 Parameter Values 41 
 
 h Effect of the Peak-Picking Process on the Spectrum 
 
 Analysis of a Single Time Slice ^3 
 
 5 Effect of the Peak-Picking Process on the Full 
 Spectrographic Analysis of the Word "Beat" ^H 
 
 6 Examples of the Formant Extracting Display ^5 
 
 7 Examples of the Zero-Crossing Display ^7 
 
 8 Block Diagram for Z and Z vs. Amplitude Envelope 
 
 Display ^8 
 
 9 Examples of the Z, vs. Amplitude Envelope Display 
 
 50 
 
 10 Examples of the Z vs. Amplitude Envelope Display 51 
 
 11 Relationship Between ISAMP and ISAMPB 53 
 
VI 11 
 
 LIST QF TABLES 
 
 Table's Pa S e 
 
 1 Distinctive Features 10 
 
 2 Commands Executed by Speech. System 58 
 
 3 List of Recorded Words 6k 
 
 k Learning Rates for Spectrograph! c Display 67 
 
 5 Learning Rates for Zero-crossing Display 68 
 
 6 Learning Rates for Format Extracting Display 69 
 
 7 Confusion Matrix for Subject A, Test la, 
 
 Spectrographic Display TO 
 
 8 Confusion Matrix for Subject B, Test la, 
 
 Spectrographic Display 7 1 
 
 9 Confusion Matrix for Subject C, Test la, 
 
 Spectrographic Display 72 
 
 10 Confusion Matrix for Subject D, Test la, 
 
 Spectrographic Display 73 
 
 11 Confusion Matrix for Subject E, Test la, 
 
 Spectrographic Display 7^ 
 
 12 Confusion Matrix for Subject A, Test lb, 
 
 Spectrographic Display , 75 
 
 13 Confusion Matrix for Subject B, Test lb, 
 
 Spectrographic Display ' ° 
 
 Ik Confusion Matrix for Subject A, Test la, 
 
 Zero-crossing Display 77 
 
 15 Confusion Matrix for Subject B, Test la, 
 
 Zero-crossing Display 7o 
 
 Zero-crossing 
 
 16 Confusion Matirx for Subject A, Test lb, 
 
 Zero-crossing Display '° 
 
 17 Confsuion Matrix for Subject B ? Test lb, 
 
 Zero-crossing Display 
 
 18 Confusion Matrix for Subject A, Test la, Formant Extraction .... 
 
 19 Confusion Matrix for Subject A, Test lb, Formant Extraction .... 82 
 
IX 
 
 Page 
 
 20 Detailed Comparison Matrix for Subject A, Test 2, 
 Spectrograph^ c Display 87 
 
 21 Summary Comparison Matrix for Subject A, Test 2, 
 Spectrographic Display 88 
 
 22 Detailed Comparison Matrix for Subject A, Test 2, 
 Zero-crossing Diaplay 89 
 
 23 Summary Comparison .Matrix for Subject A, Test 2, 
 Zero-crossing Display 90 
 
Chapter 1 
 INTRODUCTION 
 
 The purpose of this study is to investigate several methods 
 for producing visual displays of speech signals. Visual speech displays 
 are generally used either as speech analyzers or as speech recognizers. 
 In the first case they can he used to extract a greater or lesser amount 
 of information from a speech utterance and this information can then be 
 recorded and compared with displays of other utterances to determine the 
 types of information "which characterize speech. Traditionally, there 
 have been two separate approaches to speech display analysis: one which 
 attempts to determine a display transform which will present all the 
 information necessary to determine the various phonemes and the other 
 which takes a display of a single type of speech parameter and tries to 
 see how much discrimination can be obtained from it. The former approach 
 has traditionally been followed by experimenters whose eventual aim 
 was to build a workable speech recognizer while the latter approach has 
 been used by people involved in speech therapy to help correct specific 
 speech problems. An additional distinction between the approaches is 
 that the former have tended to be much more expensive. 
 
 In the speech recognition type of display utilization, the 
 display produces a visual image from a sound input and the viewer has to 
 decide what utterance, out of all possible utterances, is being displayed. 
 In the most powerful form of this display, the speech typewriter, the 
 output would consist of the typed version of the word or words spoken. 
 It can be argued that this is not a display but rather a full-fledged 
 speech recognizer. In any case, we will ignore it for the present. 
 In the less powerful forms, this type of display produces an output 
 
image which represents some transformation of the speech input and 
 which the viewer, possibly only after much practice, is expected to 
 recognize. 
 
 The purpose of the present project is eventually to develop 
 a display system which can be used as a visual feedback link for pronun- 
 ciation. At the most advanced level, we might have a system which would' 
 analyze the user's utterance, compare it with some standard, and then 
 flash on a "y es " °? "no" light. However this would involve a much 
 better knowledge of speech and the speech mechanism than is currently 
 available. It would also provide no information about what was particulrl 
 wrong about the utterance. Thus the purpose of the present project was 
 to eventually develop a visual display system which would present the 
 transformed image of the user's utterance along with an image of the 
 standard. The standard might be an idealized form generated by the 
 display unit or it could be the version just spoken by an instructor. 
 In either case it would be the task of the user to correct the image 
 of his version by repronouncing it until it approached the given standar 
 to within the appropriate tolerances. 
 
 Such a system could be used in any situation in which a person 
 requires a visual corrective feedback path to improve his speech. One 
 excellent example is that of people who have been deaf from a very 
 early age. Because they are unable to hear their own voice or the voice 
 of others, it is very difficult for them to learn correct pronunciation. 
 A visual feedback device would be very helpful in such a situation. A 
 second example, though not as desperately important, would be in the 
 area of foreign language teaching in which the visual feedback could be 
 used as a supplement to conventional language training. 
 
In order to develop this type of display system, several steps 
 must be taken: 
 
 1) A suitable transformation must be found to transform the 
 spoken speech input into some format capable of being displayed. 
 
 2) Depending on the type of display chosen, tolerances must 
 be developed so that it is possible to tell when two spoken utterances 
 are acceptably close. , 
 
 3) A suitable technique for instructing students in the use 
 of the display must be developed since it is doubtful that any of the 
 displays will be suitable for use without some period of instruction 
 and practice. 
 
 The purpose of this study was to investigate various types 
 of speech displays, to produce acceptable simulations of several of 
 these displays using a computer-driven graphics display system, to 
 develop some type of standardized evaluating procedure for speech 
 displays, and apply this standard procedure to certain selected types of 
 displays . 
 
 The remaining sections of this report can be read more or 
 less independently. Section 2 is an elementary discussion of the 
 characteristics of speech with an emphasis on those details which can 
 cause trouble in speech recognition and speech display systems. Section 
 3 traces the history of the development of the various types of speech 
 displays. Section k contains a discussion of the simulation, testing, 
 and evaluation procedures to be used in the study. 
 
 Sections 5 and 6 contain, first a description of the various 
 displays, and then a summary description of the computer programs used 
 in the simulation. A more detailed description of each program, including 
 
the listings and various test programs, can be found in Nordmann [1971]. 
 
 Section 7 discusses the results of a preliminary evaluation 
 study while section 8 summarizes the results and conclusions of the 
 study and outlines further possible avenues of research. 
 
 Section 9 contains the list of references used in the report. 
 
Chapter 2 
 CHARACTERISTICS OF SPEECH 
 2.1 Problems in Speech Analysis 
 
 Speech processing devices have long been plagued with various 
 problems which result from the characteristics of speech itself and from 
 the effects of individual speaker differences. As Liberman, et al. [1967a] 
 have explained, "the sounds of speech are a special and especially efficient 
 code on the phonemic structure of language, not a cipher or alphabet". 
 What this means is that the phonemic message being transmitted is highly 
 restructured at the level of sound. As a result, the speech signal 
 characteristics of a given phonemic unit vary greatly according to context. 
 
 The basic biological reason for the recoding is the fact that 
 both the ear and the vocal articulators are slow speed devices, so that 
 in order to deliver information at a higher rate, it is necessary to 
 operate in parallel at both ends of the communication channel. Thus a 
 given speech characteristic will, in general, give information about 
 more than one phoneme and a given phoneme will be determined by more 
 than one particular set of speech characteristics. Obviously this 
 characteristic of speech greatly complicates any attempts at speech 
 processing. 
 
 Bobrow and Klatt [1968] have discussed a variety of the 
 more mundane problems involved in speech processing. Some of these 
 problems are as follows: 
 
 l) The intensity range from one utterance to the next varies 
 tremendously due to different amounts of vocal effort on the part of 
 the speaker and the varying distance between the person speaking and 
 the microphone. 
 
2) The onset time of am unknown word is not a simple feature 
 to detect reliably. This is true especially for certain initial voice- 
 less consonants. It is also fairly difficult to separate the various 
 phonemes which make up an utterance because the parallel operation of 
 the speech mechanism does not produce a clear cut phoneme boundary. The 
 most successful methods developed so far (e.g. Reddy [1966], Hughes and 
 Hemdal [1965], Sakai and Doshita [1963], Otten [l96Uc], etc.) involve 
 the establishment of certain parameters of the speech signal which are 
 measured over extremely short periods of time. The behavior of these 
 parameters from one time interval to the next then serves to establish 
 whether the particular interval is the beginning of a new phoneme or a 
 continuation of the previous one. 
 
 3) The duration of a word is highly variable. In addition, an 
 increase in speaking rate is not accompanied by decreasing the length of 
 time for each phoneme by the same proportional amount. For example, the 
 time needed to pronounce stop consonants such as "p" or "b" is not as 
 greatly affected by changes in speaking rate as is the time needed for 
 vowels. Thus the time normalization problem is non-trivial. 
 
 h) Variations in stress and accents can greatly change the 
 acoustical properties of the speech signal. 
 
 5) Each speaker has a different vocal cavity configuration and 
 as a result, each speaker generates a speech signal with a different 
 spectral configuration. 
 
 These problems, although originally discussed in the context 
 of speech recognition, are also critical sources of variance in speech 
 displays. In order to produce an effective display some means must be 
 found for reducing or normalizing the effects just mentioned and 
 
accentuating the effects which are relevant to distinguishing between 
 different phonemes and words. 
 
 In the system being proposed this will be done by using two 
 displays where the first display is produced by the subject and the 
 second is presented as a standard. The task of the subject is to com- 
 pare the two displays and to decide in what particulars, if any, they 
 differ. It is hoped that most of the normalization problems can be 
 solved by a combination of using the proper physical display and train- 
 ing the human observer to perform the proper pattern recognition 
 tasks. After sufficient training the subjects should be capable of 
 making the proper generalizations between two displays and determining 
 the relevant points of difference and similarity. 
 2.2 Significant Parameters of Speech 
 
 In order to make the observers' task as easy as possible, the 
 display should present only those speech parameters which are necessary 
 for the recognition of the speech itself. Over the past twenty-five 
 years a variety of research has been carried out in the search for 
 these "significant parameters". 
 
 One of the more important features is the frequency structure 
 of the speech wave. This structure typically peaks at three or four 
 frequencies due to the resonating effects produced by the oral cavity 
 during the production of speech. These peaks are called formants 
 (Potter [19^7] originally called them "hubs") and are most prominent 
 during vowels and other voiced sounds. They are numbered beginning 
 with the lowest frequency first. 
 
 Although the absolute frequency ranges of the various formants 
 overlap from one speaker to another and from one utterance to another 
 
8 
 
 by the same speaker (Campanella,et al[l965]), the relative position 
 of these formants appear to he important in determining steady state 
 sounds such as vowels (Potter [l9*+7]» Fry [1958]). In particular, it 
 appears that the relationship of the formant frequencies of a given 
 vowel to the formant frequencies of the other vowels spoken by the 
 same speaker are important in the identification of that vowel (Ladefoged 
 [1957]). Thomas [1966] has also shown that the second formant is the 
 most important in this respect. 
 
 An even more important feature appears to be the transitions 
 which the formants make during speech. These transitions occur as the 
 vocal apparatus changes its configuration in order to pronounce the next 
 phoneme in a given word. The Haskins Laboratories have done a considerable 
 amount of work in this area by using a speech synthesis technique, in 
 which various formant structures are converted to speech, and then checking 
 this synthetic speech in its similarity to real speech (DeLattre et al. 
 [1955], Harris et al. [1958], Liberman [1957], Liberman et al. [195*0, 
 Liberman, et al. [19^8], etc.). J. P. Radley [1956] has criticized this 
 technique in that it used synthetic speech, but when he performed 
 analyses of real speech, many of his results were similar. A summary 
 of the cues which are useful in studying formant structure is given in 
 Liberman, et al. [1959]. 
 
 In addition to working on the transitions, Radley noted that 
 sound bursts in the high frequency region were also important, especially 
 in consonants such as "p" , "t" , and "k". Halle, et al. [1957] and 
 Fry [1958] have also discussed this and Fry observed that it is necessary 
 to measure the duration of the noise as well as its spectral qualities. 
 
 A different method of characterizing speech has been proposed 
 by Roman Jakobson, Fant and Halle (see Jakobson, Fant, and Halle [1952] 
 
and Jakobson and Halle [1956]). This method sorts out sounds using 
 decisions based on the presence or absence of certain distinctive 
 features such as voicing, nasalization, etc. 
 
 Table 1 gives a partial listing of some distinctive features 
 and their values for certain phonemes. Various authors differ as to what 
 is included in the list of distinctive features. The list in figure 1 
 is a composite of several different lists. 
 
 The important point as far as speech recognition is concerned 
 is that the features can be determined independently and each has only 
 a few possible values (usually only 2). This makes these features an 
 ideal analysis method since the values of the various features can be 
 determined from the speech wave without resorting to highly precise 
 measurements . 
 
 In general the use of distinctive features has been somewhat 
 successful in speech recognition (Hughes [1961] and Hughes and Hemdal 
 [1965]) but has found only limited use in speech displays. This latter 
 fact may be due to the difficulty of producing an adequate display of 
 8 or 10 variables. In the one example known to this author (Upton [1968]) 
 the display was specifically designed as a supplement to normal lipreading 
 and as a result only displayed those features which were specifically 
 hard to see from lip movements alone. 
 
 In addition to the various types of information already mentioned, 
 there are other types of speech parameters which might prove useful. 
 Potter [19^5] has suggested that pitch must be shown if a display is to be 
 used for speech correction. This is certainly one of the speech functions 
 which is most often involved in attempts to correct the poor speech habits 
 
10 
 
 J3 
 
 
 
 1 
 
 
 
 
 
 + 
 
 
 
 
 
 
 Tf 
 
 
 
 + 
 
 
 
 
 
 1 
 
 1 
 
 1 
 
 S 
 
 + 
 
 
 SI 
 
 
 
 + 
 
 
 
 
 
 1 
 
 + 
 
 + 
 
 S 
 
 + 
 
 
 -p 
 
 
 
 + 
 
 
 
 
 
 + 
 
 1 
 
 1 
 
 S 
 
 1 
 
 
 
 
 
 + 
 
 
 
 
 
 + 
 
 + 
 
 1 
 
 2 
 
 1 
 
 
 w 
 
 
 
 + 
 
 
 
 
 
 + 
 
 + 
 
 + 
 
 s 
 
 > 
 
 
 C 
 
 
 + 
 
 + 
 
 
 
 
 + 
 
 
 
 
 s 
 
 + 
 
 
 ^> 
 
 
 
 + 
 
 
 + 
 
 
 
 1 
 
 1 
 
 
 P-h 
 
 + 
 
 
 > 
 
 
 
 + 
 
 
 + 
 
 
 
 1 
 
 + 
 
 
 pti 
 
 + 
 
 
 a 
 
 
 
 + 
 
 
 + 
 
 
 
 + 
 
 1 
 
 
 p^ 
 
 1 
 
 
 <JH 
 
 
 
 + 
 
 
 + 
 
 
 
 + 
 
 + 
 
 
 [X( 
 
 1 
 
 
 S 
 
 
 + 
 
 + 
 
 
 + 
 
 
 + 
 
 
 
 
 fe 
 
 + 
 
 
 tj 
 
 
 
 + 
 
 + 
 
 
 
 
 1 
 
 1 
 
 1 
 
 pq 
 
 + 
 
 
 "-3 
 
 
 + 
 
 + 
 
 + 
 
 
 
 
 1 
 
 1 
 
 + 
 
 cq 
 
 + 
 
 
 ,r- 
 
 > ' 
 
 
 + 
 
 + 
 
 
 
 
 1 
 
 + 
 
 + 
 
 pq 
 
 + 
 
 
 £i 
 
 
 
 + 
 
 + 
 
 
 
 
 + 
 
 1 
 
 1 
 
 pq 
 
 1 
 
 
 si 
 
 
 
 
 
 + 
 
 + 
 
 
 
 
 + 
 
 1 
 
 + 
 
 pq 
 
 1 
 
 
 
 
 
 + 
 
 + 
 
 
 
 
 + 
 
 + 
 
 + 
 
 pq 
 
 1 
 
 
 cr 
 
 
 + 
 
 + 
 
 + 
 
 
 
 + 
 
 
 
 
 pq 
 
 + 
 
 
 H 
 
 + 
 
 + 
 
 + 
 
 
 
 
 
 
 
 
 
 
 
 H 
 
 + 
 
 + 
 
 
 1 
 
 1 
 
 
 
 
 
 
 
 
 
 (B 
 
 + 
 
 + 
 
 
 1 
 
 + 
 
 1 
 
 
 
 
 
 
 
 
 3 
 
 + 
 
 + 
 
 
 1 
 
 + 
 
 + 
 
 
 
 
 
 
 
 
 QJ 
 
 + 
 
 + 
 
 
 + 
 
 1 
 
 
 
 
 
 
 
 
 
 03 
 
 + 
 
 + 
 
 
 + 
 
 + 
 
 1 
 
 
 
 
 
 
 
 
 O 
 
 + 
 
 + 
 
 
 + 
 
 + 
 
 + 
 
 
 
 
 
 
 
 
 
 O 
 
 •r-i 
 
 H 
 oj 
 O 
 O 
 > 
 
 P 
 
 
 
 G 
 O 
 w 
 
 H 
 
 Oj 
 
 -P 
 O 
 
 a 
 
 
 
 CD 
 CO 
 
 tp 
 <H 
 •H 
 T3 
 
 -P 
 O 
 
 03 
 ft 
 
 
 
 
 CD 
 P 
 
 CJ 
 
 0) 
 > 
 03 
 
 •H 
 o3 
 
 rH 
 
 ft 
 p 
 
 oJ 
 H 
 
 <P 
 
 H 
 
 O 
 
 H 
 o3 
 co 
 oj 
 3 
 
 H 
 
 CD 
 
 09 
 
 C 
 CD 
 -P 
 
 P 
 
 •H 
 P 
 
 O 
 
 
 > 
 
 H 
 H 
 CD 
 
 a 
 
 p 
 
 CD 
 
 •rH 
 ^ 
 
 P 
 C/i 
 
 place of 
 
 articulation 
 
 a 
 
 •iH 
 CJ 
 
 •H 
 O 
 
 > 
 
 
11 
 
 of the deaf. Its significance in speech display applications which do 
 not involve deaf subjects is probably not as great, although it still 
 may be of some importance. 
 
Chapter 3 
 HISTORY OF SPEECH DISPLAYS 
 3.1 Early Displays 
 
 The first devices which were used to make speech visible were 
 mechanical in nature and were used for speech correction purposes. Several 
 types were in existence in the early 1900's which utilized flames into 
 which the subject's speech was directed by means of hollow tubes. The 
 successive waves of dense and rarified air caused variations in the number 
 of ions available to the flame and consequently caused the flame to flicker 
 in a manner characteristic of the speech qualities of the subject. Abramson 
 [1952] describes several of these devices and how they are used in speech 
 therapy. 
 
 Characteristically, these devices were able to produce only a 
 very gross display of the speech and about the only thing that could be 
 determined from them was the pitch, presence of nasalization, or the 
 relative volume of the speech. However, this is often quite helpful and 
 due to the low cost of these devices, some of them are still in use. 
 
 Another very early type of display was an ordinary speech signal 
 (i.e. microphone output) vs. time display. Abramson [1952] and Pronovost 
 [19^7] in their surveys on visual speech aids mention oscillographic dis- 
 plays but generally these displays do not give much useful information. 
 Flowers [1916] was able to produce one of these displays in 19l6 without 
 the use of an oscilloscope by using a string galvanometer. An arc lamp 
 projected the shadow of the galvanometer's silver-plated quartz fiber onto 
 a perpendicular slit behind which a photographic film was moving perpendicular 
 to the motion of the string. When a subject spoke into a microphone attached 
 to the galvanometer, a picture of the speech signal as a function of time 
 
 was produced. 
 
 12 
 
13 
 
 There were several other types of devices which were discussed 
 by both Abramson and Pronovost which have also been called visual speech 
 aids. However, in many cases these devices are quite passive. Two examples 
 are the so-called "Lite-O-Letter" , a game-like device utilizing a display 
 of transparent letters which can be lit by push buttons and the "Chromovox" 
 (also described by Cavanagh [1951]) which involved a moving display of 
 words and pictures to be spoken by the deaf pupil and a series of lights 
 controlled by the teacher and used for reinforcement. Since these devices 
 depend entirely on the skill of a speech therapist to judge the correctness 
 of the speech sound and activate the proper indicator, they will not be 
 considered any further here. 
 3.2 Spectrographic Displays 
 
 The emphasis on the more modern, electronic displays began with a 
 Bell Telephone Laboratories project which was started early in 19^-1. A 
 device for the visual translation of sound was needed in order to carry on 
 some special studies in speech distortion which were part of the war effort. 
 Once the needs of the military had been accomplished, however, it became 
 possible to work on the device with the view of producing a form of "visual 
 hearing". 
 
 The device itself was called "the sound spectrograph" and produced 
 a three-dimensional representation of the speech signal in which time was 
 plotted on the horizontal axis and frequency on the vertical axis with the 
 intensity of the particular frequency component at a given time being 
 represented by the intensity of the display at that point. Later a variety 
 of displays were developed using three-dimensional formats with the time 
 dimension being represented along the horizontal axis. For the remainder 
 
Ik 
 
 of this paper this display format type will be referred to as a linear 
 time display . 
 
 The first published reports of spectrographic linear time 
 displays began to appear as soon as the war ended and for several years 
 thereafter (Kopp [19^6], Peterson [195*0, Potter [19^6], Riesz and Schott 
 [19^6] and Steinburg and French [19^6]). There were actually several 
 different types. One of the first types (Koenig, et al. [19^6]) produced 
 a permanent record by repeatedly analyzing the speech signal with a 
 variable center frequency filter and displaying the rectified filter 
 output on a piece of paper by means of a variable intensity stylus. 
 Another model (Dudley and Gruenz [19^6]) used a moving phosphor belt and 
 parallel filters to display the signal in real time. Still a third 
 (Mathes et al. [l9*+9]» Johnson [19^6] used a magnetic disk and CRT system 
 which recorded the signal and then replayed it many times at very high 
 speed using a variable filter to give a rapid CRT display. 
 
 In 19^7, Potter, Kopp and Green published the first edition of 
 their book, Visible Speech [19^T]» which described the work they had done 
 at Bell Laboratories. They had attempted to teach people to read the 
 spectrograms they had produced much as you would read a book. 
 
 They began with a group of five young women in the fall of 19^3 
 The instruction schedule called for two hours of group instruction and 
 one hour of individual study each day. The following year four more your, 
 women were added to the group and also a male electrical engineer who was 
 congenitally deaf. 
 
 The learning rate for the newcomers to the group was about 3-1/ 
 words per hour of study. The engineer eventually achieved a vocabulary C 
 800 words. The four female newcomers achieved between 100 and 300 words 
 but they had not practiced as long. Within the limits of their vocabula: 
 
 J. 
 
15 
 
 the visible speech class members were able to converse by enunciating 
 clearly and at a fairly slow rate. Potter remarked that intelligibility 
 was roughly equivalent to a very noisy telephone connection. 
 
 Later on the original Visible Speech Translator was moved to 
 the Detroit School for the Deaf, where Kopp and Kopp [ 1963a, 1963b] used 
 it to teach speech intonation and stress to deaf children. Similar 
 versions based on its design were fabricated at other locations as 
 well (e. g. House, et al. [1968]). In 1965-1966 a transistorized 
 version of the translator was produced at Bell Telephone Laboratories. 
 Stark, et al. [1968] have reported on its use as a training aid for deaf 
 subjects. They found that especially in the case of younger subjects, 
 the display was of significant help but that supplemental speech 
 instruction was also necessary. 
 
 As interest in speech spectrograms grew, various other groups 
 designed devices for producing them. The Haskins Laboratory began speech 
 investigations using synthetic spectrograms and a "pattern playback" 
 device which was a "spectrograph" in reverse. Ramaswamy [1962], Harris 
 and Waite [1963], Presti [1957, 1966] and many others developed spectro- 
 graphs of varying speeds. However, they all produced the same general 
 type of display, differing only in the way the display was produced. 
 3.3 Spectrographic Variations 
 
 Unfortunately there were several problems with the sound spectro- 
 graph. In addition to the poor overall quality of transmission, one of the 
 major problems was that some of the more important features which were 
 necessary for distinguishing between different words were not always easy 
 to see on the display. Therefore, as time went by, various imporovements 
 were attempted. 
 
16 
 
 Koenig and Ruppel [19^8] describe several methods for increasing 
 the visible dynamic range of the spectrogram. One method involved using a 
 dot display where the density of the dots represented the intensity of the 
 specific frequency component. Another method, which was also described 
 by Prestigiacomo [1962], used contours to display the intensity. A third 
 method which was further elaborated by Kersta [19U8] involved reducing the 
 spectrogram to a frequency vs. frequency magnitude plot only for specific 
 instants of time. This allows the frequency distribution to be shown in 
 more detail but drastically restricts the number of time intervals shown. 
 
 Another modification was one by Kock and Miller [1952] in which 
 a differentiated version of the spectrogram was used. The display involved 
 the differentiation of the time-amplitude pattern for different points on 
 the spectrum. The advantage claimed for this method was that rapid changes 
 in spectrum content, which tend to contain the most phomemic information, 
 show up more easily. 
 
 D. E. Wood and T. L. Hweitt have described another modification 
 [1963, I96U] in which a real time spectrograph was used to display just the 
 peaks of the spectral cross sections. This eliminated the need for intensity 
 modulation of the visual display. This display, as does the Kock and Miller 
 display, emphasizes the formant frequency excursions since it is not cluttere 
 with as much "background" data. In use as a speech analyzer this display 
 was quite informative. However, it was not completely satisfactory in the 
 case of stop-consonant bursts and other such signals. 
 3. k Other Linear Time Displays 
 
 As more work was done with spectrograms their limitations 
 became increasingly apparent. Although they were a good means of dis- 
 playing the detailed information for an analysis of speech, they could 
 not be read easily or quickly. As a result, several other linear 
 
17 
 
 time displays "were tried. These displays used the same format hut 
 processed the speech signals using different techniques in the hope 
 that they would he easier to "read". 
 
 A display hy Biddulph [195*+] and an earlier one hy Bennett 
 [1953] utilized autocorrelation functions and displayed the delay 
 parameter, x, vs. time with the magnitude of the autocorrellation 
 function "being shown as, the intensity. As it turned out, this display 
 was actually harder to read than a spectrogram since it became very 
 sensitive to non-critical information in the speech signal. Huggens 
 [195*0 and Stevens [1950] have each given a detailed analysis and 
 critique of this method. Huggens shows that slight changes in pitch 
 may cause large changes in the display. One other undesirable character- 
 istic of the display was that it was a quadratic function of the frequency 
 components and thus a large dominant frequency could obscure the effects 
 of smaller amplitude frequency components. 
 3.5 Two -Dimensional X-Y Displays 
 
 All of the displays discussed so far have been linear time 
 displays utilizing three display parameters. Another type of display 
 format which has been developed involves only two dimensions in which 
 time is generally omitted as a direct display parameter. Instead these 
 displays use the chosen parameters as "x" and "y" inputs in a plotter 
 (usually a CRT) which then plots the resulting point as the parameters 
 vary with time. By using time only in this indirect sense, the origin- 
 ators of these x-y displays hoped to eliminate the effect on their displays 
 of varying time duration between different utterances of the same word. 
 
 One type of x-y display which was developed utilized 90° phase 
 shifting circuits. In this type of display the processing hardware 
 
18 
 
 converted the original speech input into two output signals which 
 were 90 out of phase with one another. Lerner [1952, 1959] 5 Vilbig 
 [195^] 9 and Barton and Barton [1963] have all described displays of 
 this type. These displays have been tested by several people but the 
 results are inconclusive. J. E. Connor [1955] and F. E. Fabian [1955] 
 evaluated the effectiveness of Lerner 's display in speech correction 
 and claimed that it was just as good as but no better than "conventiona' 
 speech therapy in the case of articulation disorders but of no signifies 
 help in voice improvement. However, in a later preliminary study, 
 Pronovost [196U] felt that this display showed some promise in improvin 
 the articulatory proficiency of deaf children. Unfortunately, a 
 subsequent study (Pronovost, et al. [1968]) was unable to produce more 
 definite results. 
 
 Pyron and Williamson [196^] gave a critique of Barton and 
 Barton's apparatus and indicated what they thought was the general 
 problem with all such techniques, namely that they work best on continu-u 
 sounds (i.e. vowels and nasal consonants) and are very poor on transitu, 
 (i.e. consonants) which carry a high proportion of the speech informati-. 
 
 A different type of x-y display has been developed in Switzer 
 land by Dreyfus-Graf [19^6, 19^8, 1950a, 1950b]. This display uses a 
 system of filters and differentiators to produce pulses which control 
 the movement of an ink pen. The author claimed that the resulting 
 squiggles, which do appear fairly consistent for sustained vowels, coul 
 be used as a phonetic shorthand. However, as far as this author knows, 
 there has been no report on the use of this device with a normal speech 
 input . 
 
19 
 
 Another x-y display using a CRT has been reported by Plomp, 
 Pols and Van de Geer [1967]. They analyzed 15 Dutch vowels by using a 
 bank of 18 filters to process the speech signals and studying the differ- 
 ences between the vowel spectra. The resulting dimensional analysis 
 yielded four dimensions which accounted for 96. h% of the total variance 
 once the between-subject variance had been allowed for. The authors 
 suggested using plots of the first dimension vs. the second as an aid for 
 the deaf. An oscilloscope display for the vowels has been produced but 
 work is only beginning on the consonants. This method was suggested as an 
 alternative to a type of display in which the frequencies of the first and 
 second formants for various vowels are plotted as points or regions on a 
 two-dimensional graph (see for example Davis [1952], Foulkes [1961], Hughes 
 [1965] or Majewski [1967])- Although this type of a representation is very 
 appropriate for vowel sounds it has not met with much success in the 
 representation of consonants. It remains to be seen if Plomp, et al. 
 will be able to apply their technique to the consonants. 
 
 Cohen [1968] has described an x-y display developed by Arthur 
 D. Little, Inc., which was made from a converted TV set. It used a type 
 of frequency analysis somewhat similar to the cepstrum analysis technique 
 (Noll [196U, 1967]) in which the log of the output of a spectral analysis 
 is subjected to another "spectral analysis" to determine "shape" character- 
 istics of the original spectrum. The ADL display makes use of 10 filter 
 channels and by the use of various weighting factors resolves their out- 
 puts into sine and cosine components of the frequency spectrum envelope. 
 These two components are then plotted as the x and y coordinates of the 
 display. The net result is somewhat analogous to a two-formant display 
 
20 
 
 but the problem of formant identification is avoided. The device is 
 currently undergoing evaluation for use by deaf people for speech 
 improvement . 
 3.6 Zero-crossing Displays 
 
 In addition to classifying visual displays according to their 
 physical format, they can be separated according to the type of processing 
 used on the speech signal. Thus we have already discussed spectrographic , 
 correlation and phase splitting displays, among others. Another very 
 common type of processing is the extraction of zero-crossing information. 
 One of the reasons this type of processing is so popular is that it can 
 be easily performed using a high gain amplifier and clipping circuit, and 
 is thus cheaply implemented. 
 
 One linear time display version of a zero-crossing display was 
 developed by Chang, et al. [ 1951b] and further developed by Sakai and 
 Inoue [i960]. It was called an "intervalgram". This display used the 
 time intervals between zero-crossings or between zero-slopes (i.e. zero- 
 crossings of the differentiated signal) as a parameter to be plotted against 
 time. The display produced a dot for each interval between zero-crossings 
 where the horizontal position of the dot was determined by the ralative 
 time position of the interval and its vertical position by the frequency 
 of the sinusoidal signal which would have produced an equivalent interval 
 between zero-crossings. The result is a halftone display consisting of 
 dots which look somewhat similar to a spectrogram. 
 
 C. C. Bridges [I96U] has produced a more simple linear time 
 zero-crossing display by simply plotting the zero-crossing rate as a 
 function of time on an oscilloscope. 
 
 The main justification for using these parameters was the 
 finding by Licklider and Pollack [19^8], Licklider [1959], and others, 
 
21 
 
 that highly clipped speech signals, and highly clipped differentiated 
 speech signals were still quite intelligible to the human ear. Thus, 
 since these clipped signals contain only interval information about zero- 
 crossings or zero-slopes, a display of this information should contain 
 all the essential information of speech. In addition, of course, these 
 parameters were much easier to obtain than spectrograms or correlation 
 patterns. However, the authors were unable to show that intervalgrams 
 were any easier to read although Sakai and Doshita [1963, 1968] did use 
 this technique for speech analysis and recognition. 
 
 Pyron and Williamson [1965] have developed an x-y display 
 utilizing zero-crossing information in which they extracted the amp- 
 litude envelope of the speech signal as well as the rate of zero-crossings 
 and the rate of zero-slopes. They experimented with plots of amplitude 
 vs. zero-crossings, zero-crossings vs. zero-slopes, and amplitude vs. 
 zero-slopes, but since they discovered that the latter gave consistently 
 clearer and more characteristic patterns, most of their results are 
 concerned with that form. As the authors noted in their report , Chang 
 [1951a] has provided a theoretical analysis and experimental evidence to 
 show that, in speech signals with a pronounced formant structure, the 
 rate of zero-crossings corresponds to the first speech formant while 
 the rate of zero-slopes corresponds to the second speech formant. Thus, 
 their display is analogous to an amplitude envelope vs. second formant 
 x-y display. 
 
 Ewing and Taylor [1969] have duplicated Pyron and Williamson's 
 display and have attempted to improve upon their results. They initially 
 worked with a zero-crossing vs. zero-slope type of display with the 
 eventual aim of generating patterns which could be recognized by 
 
22 
 
 computer. They also tried adding a time sweep to both axes which gave 
 the display a diagonal rise across the face of the CRT. However, they 
 felt their most promising version was one in which the difference between 
 the zero-crossing and zero-slope signals was plotted vs. time. In this 
 case they still did not get the desired results but they felt that this 
 was due to poor comparison methods during the recognition phase of their 
 procedure. 
 3.7 Pitch Extracting Displays 
 
 Another type of processing used in producing speech displays 
 is pitch extraction. As early as the 1930' s, Coyne [ 1938a, 1938b] and 
 Timberlake [1938] reported on a voice pitch indicator using ik to 20 
 mechanical band-pass filters (i.e. tuning forks) with lamps which in- 
 dicated the pitch frequency. Its use in South African schools for the 
 deaf has shown good results for younger subjects but negative results 
 for older subjects with settled voice habits. 
 
 Dolansky [1955] has described a pitch extracting device based 
 on a time domain analysis. The descendents of this device have been 
 used to produce displays which have been used in several experiments. 
 These displays are linear time displays but only use two dimensions. 
 Time is on the horizontal axis with the position on the vertical axis 
 indicating the pitch period of the incoming speech. The intensity of the 
 display is turned off when no voicing is present, but other than this, is 
 independent of the speech input. 
 
 F. Anderson [i960] has used a version of Dolansky' s pitch 
 extractor utilizing a revolving CRT with a view panel, cut so that only a 
 portion is displayed on the vertical axis against a continuous horizontal 
 time base. The CRT uses a long-persistence phosphor so that the display 
 
23 
 
 can "be seen for five seconds. The display was used with a group of 
 eight children from ages 8 to 12, with hearing losses of 60 db or more. 
 It appears to have been somewhat useful although the author did not go 
 into detail about it. 
 
 The group headed by Dolansky at Northeastern University con- 
 tinued to work on pitch displays (Dolansky, et al. [1965], Dolansky and 
 Phillips [1966], and Phillips, et al. {1968]). They performed several 
 studies using deaf children as subjects as well as normal hearing 
 university students. The results indicated that the display was of some 
 use in teaching deaf children and that it was possible to use the dis- 
 play as a visual feedback indicator for speech pitch. 
 
 A variety of other researchers have developed pitch extraction 
 displays (Gruenz and Schott [19^9], Plant [i960], Martony [1968], and 
 others). In addition, several displays have been made which incorporated 
 a pitch display along with some other type. Stark's spectrographic 
 display [1968], mentioned earlier, uses pitch and amplitude as well as 
 spectrographic information. Pickett and Constam [1968] describe a multi- 
 display device developed at the Hearing and Speech Center of Gallaudet 
 College which in addition to being able to produce a pitch display, could 
 also generate vowel spectrum indications, intensity vs. pitch displays, and 
 intensity contours. 
 3. 8 Miscellaneous Formats 
 
 In addition to the linear time and x-y display formats, there 
 has been a variety of other types of attempts. D. E. Williams [1967] has 
 designed a light bulb display which consists of a matrix of lights and an 
 electronic circuit to drive it, which frequency analyses an utterance into 
 10 frequency regions and displays the results in a bar graph form. There 
 
2k 
 
 was also a second display which indicated the relative length of time 
 each frequency component was above a certain threshold. However, the 
 display appears to be valid only for sustained sounds like vowels and 
 even in these cases varies tremendously with such irrelevant variables as 
 distance from the microphone, speech rate, etc. 
 
 Hubert W. Upton has developed a wearable eyeglass speechreading 
 aid (Upton [1968], Picket [1969], Risberg [1969]) which detects voicing, 
 friction, stops, etc. Miniature lights imbedded in the eyeglasses glow 
 whenever the corresponding speech feature is present. The device was 
 specifically designed as an aid to lipreading and therefore the speech 
 features which were chosen were those not visible on a speaker's lips. 
 The designer noted that although the analyzing functions did not work 
 perfectly, the device still gave a significant amount of information not 
 obtainable by lip reading alone. 
 
 In addition to these displays, there are several other types 
 which have been in use by speech therapists but which do not fit neatly 
 into any of the categories mentioned so far. Risberg [1968] discusses a 
 variety of these devices which he helped to develop, including various 
 types of indicators for fricatives, s-sounds, intonation, rhythm, and 
 nasalization. Some of these displays might be called linear time displays, 
 but others involve simply meters or lights which turn on when a given 
 threshold has been reached for the quantity being measured. The primary 
 principle was to minimize the number of functions displayed by a single 
 device. This was done both to decrease the cost and to isolate the speech 
 feature to be controlled. 
 3.9 The Use of Speech Displays 
 
 Although a wide range of speech displays have been developed over 
 the past twenty years, there has been a reluctance on the part of speech 
 
25 
 
 therapists to make widespread use of them. The reasons for this have 
 "been "briefly mentioned above and center on the cost of the devices and 
 the pedagogical problems which they produce. From a cost standpoint, it 
 is easy to see that the more complicated (and thus more costly) displays 
 would badly distort the small budgets of most schools for the deaf. 
 The pedagogical resistance might be a little harder to justify. However, 
 although it may be true that some of the resistance is simply a result 
 of inate conservatism on the part of teachers of the deaf, it is also 
 true that very little testing has been performed on the effectiveness of 
 the various display types. Thus the fears of these teachers toward using 
 untested techniques on children whose futures may depend on them are 
 somewhat justified. 
 
 More recently, however, the situation has been changing. A 
 variety of small experiments have been performed to determine the feasi- 
 bility of using particular displays as a visual feedback link to replace 
 the auditory feedback link which has been destroyed in deaf people. The 
 primary goal has been to use some type of visual display to indicate 
 to the deaf subject how correct or incorrect his pronunciation actually 
 is. In general, these studies have been promising for younger subjects, 
 though not as successful for older subjects. 
 
 The tests themselves have mostly been performed by specialists 
 in the area of speech training and have involved the simpler types of 
 displays. Cost is the obvious reason for this latter fact. This same 
 fact also makes it very difficult for any one group to build and test 
 more than one or two displays at the same time. As a result there has 
 been very little work done in developing general testing techniques which 
 could be applied by a single group to a wide variety of displays in order 
 
26 
 
 to determine the relative effectiveness of the different types. Happily 
 this trend appears to be reversing as can he seen by the previously 
 mentioned development of systems which can produce more than one display '< 
 type. 
 
 Although many groups have been able to use speech displays as 
 feedback aids in speech correction for the deaf, the original goal of 
 the Bell Laboratory group, i.e. actually reading the display, has yet 
 to be achieved. It has in fact been suggested by A. M. Liberman, 
 et al. [1967a] of the Haskins Laboratory, that "we may never be able to 
 perform this type of direct conversion. This is so, they maintain, becaus 
 there is no simple one-to-one correspondence between the characteristics 
 of the speech signal and the phonemes which it represents (Liberman, 
 et al. [1967b]). Since the speech signal is basically a complex code 
 as opposed to a simple cipher, the phonemic message being transmitted is 
 highly restructured at the level of sound. As a result, the speech 
 signal characteristics of a given phonemic unit vary greatly according 
 to context. 
 
 The basic biological reason for the recoding is the fact that 
 both the ear and the vocal articulators are slow speed devices, so that 
 in order to deliver information at a higher rate, it is necessary to 
 operate in parallel at both ends of the communication channel. Thus a 
 given speech characteristic will, in general, give information about 
 more than one phoneme and a given phoneme will be determined by more than 
 one set of speech characteristics. 
 
 The key point to their argument against the readibility of 
 visible speech, however, is their statement that although a decoder of 
 such signals obviously exists, it appears to be unalterably linked to the 
 
27 
 
 auditory sensory system. Thus although it might he possible to create 
 displays which emphasize the important key features of the speech signal, 
 it does not appear possible to produce a display which would allow the 
 viewer to unconsciously decode the signal into phonemes. It should be 
 noted that this key point, by the authors' admission, appears to be 
 true only because in 20 years of experience nobody has been able to 
 learn to visually decode spectrograms without a great deal of conscious 
 mental effort. 
 
 Recently, however, Lenneberg [1967] has discussed the effect of 
 age and development on the learning of a language. According to a 
 variety of experiments it appears that the development of speech is im- 
 possible once a human has reached approximately the age of puberty. 
 Before this time, humans are capable of learning language even if large 
 portions of the brain which are normally connected with this process are 
 destroyed by disease or accident. The brain seems to be very plastic at 
 this age and highly adaptable. 
 
 As a result, it may be possible that the proposition put forth 
 by the Haskins group will only hold for adults since their brains have 
 already "frozen" into a permanent state. This could also explain why 
 younger subjects seem to get the most help from feedback type displays. 
 It would be interesting to try to teach a deaf child to read visible 
 speech since in this case the child's brain might actually be able to 
 adapt itself to decoding the visible input. 
 
 Be this as it may, if we grant the fact that the human eye 
 cannot be trained to become an automatic speech decoder (at least once 
 the subject passes a certain age) then the task of using a visual dis- 
 play as a speech feedback mechanism for adults can be looked at from 
 
28 
 
 two positions. If the feedback device actually performs the decoding 
 before presenting the visual display, then it becomes in effect, a 
 speech recognizer. This is precisely what the last 20 years of speech 
 research has been trying to achieve but without too much success. In 
 addition, it would not be very helpful in the present task since it would 
 not be giving the information which a poor speaker needs to correct his 
 pronunciation. 
 
 We can, on the other hand, ignore the absolute decoding 
 problem and instead concentrate on displaying the most relevant speech 
 parameters in a concise manner. In this case the observer would not 
 necessarily be able to recognize the words merely from the display. 
 However, if the proper parameters are displayed in an easily discerned 
 manner, it should be possible for the observer to detect the differences 
 between his pronunciation and a comparison display of the same speech 
 pronounced properly. This is the eventual goal which has been set up 
 for this project. 
 
Chapter J4 
 PROPOSED STUDY 
 
 The eventual aim of this research is to develop a computer 
 driven display system which can be used as a visual feedback link to 
 correct mispronunciations by people who are deaf or in other situations, 
 such as language training, where correcture feedback in pronunciation may 
 be desirable. The envisioned system would present two displays to the 
 user, one of the word as it is supposed to be pronounced and one as it is 
 pronounced by the user. His task will be to determine if they are accept- 
 ably close (this may be possible only after a certain amount of instruction 
 and practice) and if not, to determine which parts are in error and change 
 his pronunciation accordingly. 
 
 The more immediate goal of this particular study has been the 
 development of a generalized computer simulated display system. This 
 system has been built so that it can utilize a variety of speech processing 
 techniques and easily produce a wide range of speech display types. In 
 addition several of these displays can be compared with one another to 
 determine which of them is most effective in terms of presentation of rele- 
 vant variables and ease of training in their use. 
 
 The speech display simulation system has been implemented on 
 the CDC 160^ installation at the Coordinated Science Laboratory at the 
 University of Illinois. This system contains a high-resolution variable 
 intensity CRT display equipped with facilities for taking both still and 
 moving pictures. 
 
 The main advantage of using such a system to generate speech 
 displays is the flexibility inherent in a computer simulator. Using this 
 type of system it is extremely easy, once the basic processing programs 
 
 29 
 
30 
 
 have been written, to modify displays and to create new ones. There are 
 no time consuming hardware modifications to be made. Of course the main 
 disadvantage is the cost of the computer system. Once a suitable display 
 design has been found, however, a hardware version can be fabricated. 
 Alternatively, a time sharing educational system such aa the PLATO. system , 
 might be used to allow access to a large scale computer at a minimum cost. 
 If low cost display units such as the plasma panel (see Bitzer, et al 
 [1966] or ¥illson[l966] ) become readily available and are capable of pro- 
 ducing the type of displays needed, this latter implementation might be 
 an inexpensive way of providing a variety of display types to the various ■. 
 institutions needing them. 
 
 k.l Outline of the Study 
 
 The development of this study was organized into the following ; 
 
 steps: 
 
 1) The development of a basic subsystem for inputting speech 
 signals into the computer. Because of the slowness of the CDC 1604, it 
 was not possible to run the complete speech display simulation system in 
 real time. As a result the speech input subsystem has been oriented 
 around the tape units as a storage medium. The I/O programs were used to 
 read in data from an audio tape recorder attached to an A to D converter 
 and to write out the data in a packed format on magnetic tape. This data 
 was edited by the operator by means of various data manipulation programs. 
 Eventually the desired data was copied to a new tape complete with header 
 
 ocks describing the data. This edited data tape was then used as the 
 input to the processing routines. 
 
 2) The development of various speech processing routines 
 general enough to be used by a variety of display types. These include 
 
31 
 
 such routines as peak detectors, zero crossing detectors, rectifiers, fast 
 
 Fourier transforms, digital filters, etc. 
 
 3) The development of the speech display routines. These 
 routines make use of the speech processing routines plus other types of 
 general routines to produce specific displays. The types of displays are 
 explained in detail in Section 5. Some "were designed from descriptions 
 in the literature (see Section 3) and others were developed more or less 
 independently. As new speech processing routines "were found to he neces- 
 sary, they were developed and added to the programs written in stage 2. 
 
 k) The production of display photographs for use in experi- 
 mental comparison tests. A limited number of words were picked and record- 
 ings of these words correctly spoken by several people were made. After 
 being converted to digital tape, these recordings were processed by the 
 various routines to produce the desired displays. These displays were 
 photographed using a polaroid camera and each resulting picture was rephoto- 
 graphed to produce a 35 mm. slide, which could be shown to subjects by 
 means of either a slide viewer or a projector. 
 
 5) Finally, two types of tests were conducted on each of 
 several types of displays to determine their relative effectiveness in 
 displaying speech. A preliminary test was conducted for the availability 
 of the proper information for word discrimination. The preliminary test 
 was a type of concept attainment experiment in which the subjects must 
 try to identify each word from its display. The point of the preliminary 
 test was to determine if a given display type presents the proper infor- 
 mation for word identification. In other words, is the transformation 
 appropriate? Since it is fairly well established that this type of concept 
 identification task is a hard (if not impossible) task in the general 
 
32 
 
 speech display case, this test -was. made using a limited number of words. 
 
 A final test to determine the displays' usefulness in a com- 
 parison situation such as would exist in the eventual system was also con- 
 ducted. In this test the subjects were presented with pairs of photograph 
 which represented two different utterances as depicted by one of the dis- 
 play types. The two utterances could be the same word spoken by two diffe 
 ent people, different words which sound similar, or a correctly and in- 
 correctly pronounced version of the same word. The subject's task was to 
 determine if the two displays represented the same word. After his respon 
 he was told the correct answer. As a further test the subject was occasio: 
 ally asked to indicate points of similarity or difference. Then on the 
 basis of the number and type of errors made on each display, a comparison 
 between the various display methods was made. 
 
 With the completion of the comparison tests the scope of the 
 present study ended. There are still other problems. In particular the 
 question arises that even if the subject can correctly detect a difference 
 between two displays, he may not know how to change his pronunciation to 
 make the display of his version of the utterance more like the standard. 
 However, in order to test out this problem a real-time display is essentia 
 Therefore for the time being, this problem will be postponed. 
 
 In conclusion the goal of this study was to develop several 
 types of visual speech displays and then perform comparison tests on them 
 to determine their relative and absolute suitability for use as visual 
 speech feedback devices. 
 h.2 Theoretical Significance of the Comparison Tests 
 
 As was previously mentioned, the theoretical basis for the com- 
 parison tests used in this study comes from that part of the psychological 
 
33 
 
 literature, dealing with, cognitiye processes ? which- has. come .to be called 
 "concept identification^ 1 or "concept formation". The testa themselves 
 involve the establishment by the subject of various response categories, 
 i.e. the -words, based on generalized concepts "which must be developed by- 
 looking at the various instances of these categories: as: depicted by the 
 particular display type being tested. Xn order to do this the subject must 
 select those attributes from the display instances which are most relevant 
 to the des crimination process and determine how these attributes indicate 
 the proper response categories. 
 
 Over the past few years there has been a great deal of discussion 
 in the literature of concept identification about the exact method used by 
 subjects in the development of concepts in this type of situation. Restle 
 [1962], Bruner, Goodnow and Austin [1962] and Haygood and Bourne [1965] 
 all discuss various types of strategies for selecting and testing hypotheses 
 about the cues which will lead to a correct classification. Haygood and 
 Bourne [1965] break the process down into two problems: finding the attri- 
 butes of the various instances which are important in determining the con- 
 cept (s) and finding the rules involved in combining the values of these 
 attributes. The attributes may vary in their obviousness and the rules may 
 be either simple, i.e. merely the presence of a particular value of the 
 attribute, or complex, i.e. some logical relation between several attributes, 
 
 Bower and Trabssso [1963], in discussing two category problems 
 (i.e. the concept is simply the presence or absence of a particular value 
 of one of the attributes), develop an expression for the probability that 
 the subject, will focus attention on the relevant attribute, namely: 
 
a, 
 
 r - 
 
 3U 
 W. 
 
 w + w. 
 
 a 1 
 
 where W is the attention value of the relevant attribute summarizing all 
 a 
 
 Of the factors determining the subject's, selection of it for testing and 
 ¥- is the sum of these values for the irrelevant attributes. In a more 
 complicated situation, such as the present case of speech displays, there 
 are other factors to be considered as well. 
 
 In the first place, there may be redundant attributes which 
 would help to establish the response categories. These may be wholly 
 redundant or they may be only partially redundant and thus only help in sc 
 Of the cases. Secondly the rules involved in combining the attributes are 
 probably more complex than the simple presence or absence of a particular 
 value of an attribute. Some of this complexity will be due to the inhererj 
 complexity of the speech code, and some of it will be due to the partial 
 redundancy of some of the attributes. Finally, the fact that we are work- 
 ing in a multiple-response category situation will increase the complexity 
 of any such formulation. 
 
 As a result of all of this, it would be very difficult to devel 
 any kind of precise mathematical formulation for the probability of achiev 
 concept discrimination in the present case, and in fact this is not really 
 necessary. All we actually need are a few qualitative predictions. 
 
 Basically, the preliminary test is meant to be a concept forma- 
 tion situation in which the subject must learn to identify words based on 
 the cues being presented by the display type being tested. It is hypothe- 
 sized that the speed with which the subject attains the "concepts" of the 
 words as represented by that particular display is directly related to the 
 probability of concept attainment after a number of trials. This in turn 
 
35 
 
 is., hypothesized to he related to the number and. effect iy en ess of the rele- 
 yant cues and inversely- related to the number of irreleyant cues presented 
 by the display. If the words selected for display are sufficiently typical 
 of the normal speech sounds encountered in spoken language, and if several 
 speakers are used to get a typical set of speaker variations, then, provided 
 that there is a difference in the effectiveness of the various displays, 
 the results should he significant. By measuring the length of time it takes 
 to achieve a given criterion of performance on a particular display, we 
 should ohtain an indication of the relevance of that display type to the 
 problem of word identification. 
 
 The purpose of the second test was to determine, for each dis- 
 play, the type of variations of the words which can he accepted as unimpor- 
 tant. Since the second test takes place after the subject has gone through 
 the first test phases, the subject will have become somewhat proficient 
 (hopefully) at understanding the display. Thus this test is akin to a con- 
 cept discrimination task in which the subject is trained to make finer and 
 finer distinctions. 
 
Chapter 5 
 DISPLAY DESCRIPTIONS 
 
 The purpose of this section is to give a detailed description 
 of the various types of displays which have been produced by the Speech 
 Display system. Each display will be described separately in general 
 terms along with the different variants which are possible. Photographs 
 of these various displays will also be given. 
 
 Before describing the speech display types themselves, it will 
 be desirable to describe the two main display packages which these 
 speech displays use, the variable-intensity TV scan display and the 
 continuous line display. 
 
 5. 1 Variable-Intensity TV Scan Display 
 
 This display program package takes a two-dimensional array 
 of intensity points and produces a continuously varying intensity display. 
 The programs interpolate between the points in the array in both dimensions 
 and set up a TV scan display buffer which is plotted and photographed by 
 the system display routines. There are a variety of display choices which 
 can be specified by the user: 
 
 1) The number of points to be interpolated between the array 
 entries in both the horizontal and vertical directions. 
 
 2) The distance between points which are plotted by the system 
 display routines (this will affect the "grain" of the resulting display). 
 
 3) The position relative to the left-hand side of the display 
 at which the actual data will begin to be displayed. (This allows a given 
 speech display to be centered. ) 
 
 h) The minimum intensity below which the data will not be 
 displayed. (This helps to eliminate low-intensity clutter which takes 
 time to display but which adds no real information. ) 
 
 36 
 
37 
 
 5.2 Continuous Line Display 
 
 This display package produces a continuous line display using 
 either an x vs. y type data format or a format in which one data array 
 is plotted sequentially in the horizontal direction. The main display 
 options are the maximum x and y values and the type of display. 
 
 5.3 S pectrogram 
 
 At the present time the spectrographic display is the most 
 
 versatile display in the sense that it can he varied in the most number of 
 ways. As described in section 3, it is a linear time display in which 
 frequency is plotted along the vertical axis and time along the horizontal 
 axis. The intensity of the display at any given point is proportional 
 to the magnitude of the particular frequency component at the time 
 represented by that point. 
 
 The actual frequency analysis is done using a Fast Fourier Trans- 
 form program initially written by Gary Horlick of the Coordinated Science 
 Laboratory and subsequently modified by the author. The algorithm is a 
 variant of the original Cooley-Tukey algorithm (see for example, Cooley 
 and Tukey [1965], Gentleman and Sande [1966], Cochran, et al (1967], or 
 Brigham and Morrow [1967]). More recently, Alan Oppenheim [1970] has 
 presented a very good article on the use of the FFT in producing spectrograms, 
 
 Since the FFT is a discrete transform, the output frequency 
 magnitudes are, in effect, samples of the frequency spectrum of the data 
 being analyzed. The spacing of these frequency samples is determined by 
 the fundamental frequency of the time period being analyzed, and this in 
 turn is ■ determined by the number of samples being processed. Thus it is 
 possible to decrease the spacing between frequency samples by increasing 
 the number of time samples processed. This produces a more detailed 
 frequency analysis but only at the cost of having a larger time slice. 
 
38 
 
 This effectively means that although you have gained more information 
 about the frequency analysis , you are less sure about the position in 
 time to which it applies. 
 
 In addition to the time-frequency tradeoff, it is also possible 
 to adjust the number of frequency components to be displayed and thereby 
 vary the total frequency spread of the display. 
 
 Once the frequency components have been calculated for each 
 time slice in the display, a linear normalization is performed on the 
 data so that the intensity values will be within the range of values 
 used by the CRT. The value given to the maximum component in the display 
 can be adjusted to be greater than the maximum intensity which can be 
 displayed by the CRT. Since any intensity values greater than the maximum 
 displayable value are truncated to the maximum intensity value by the 
 display routines, this allows the user to specify a value range over which 
 he desires truncation of the intensity values. This feature is valuable 
 because in any given spectrogram there are always a few points which are 
 way out of line with the rest of the values and by truncating these points 
 the remaining points can be given a greater spread of values. 
 
 A second form of contrast enhancement can be used, namely high 
 frequency emphasis. This simply involves multiplying each frequency 
 component in every time slice by a factor greater than or equal to 1 and 
 having the factor increase as the frequency of the component increases. 
 In the actual program the emphasis begins at around 2000 hz. and the user 
 in effect controls the rate of increase in the multiplicative factor. 
 
 In addition to these options, the spectrographic display can mak 
 use of the various options available in the variable intensity display 
 package. Figures 1 through 3 show various examples of spectrographic 
 displays using various sets of parameters. 
 
39 
 
 a) no emphasis 
 no truncation 
 
 t>) medium emphasis 
 no truncation 
 
 high emphasis 
 no truncation 
 
 d) no emphasis e) 
 
 medium truncation 
 
 medium emphasis f ) 
 medium truncation 
 
 high emphasis 
 medium truncation 
 
 no emphasis 
 high truncation 
 
 medium emphasis 
 high truncation 
 
 high emphasis 
 high truncation 
 
 Figure 1 Effect of Variations in High Frequency Emphasis 
 and Intensity Truncation Using the Word "Shod" 
 
ko 
 
 nsamt = 258 
 
 nsamt = 512 
 
 nsamt = 102 k 
 
 Figure 2 Effect of Variations in Time Slice Size 
 
1+1 
 
 Speaker a 
 "shod" 
 
 Speaker a 
 "vile" 
 
 Speaker b 
 "said" 
 
 Speaker c 
 "said" 
 
 Speaker b 
 
 Speaker d 
 "dame" 
 
 Figure 3 Examples of the Spectrographs Display 
 with Nominal Parameter Values 
 
k2 
 
 5.^ Formant Ex--Lractin.fi ; Display 
 
 The formant extracting display is similar in format to the 
 spectrographic display. However, in this type of display the formants 
 are extracted from the display data and all other display data in the 
 frequency regions of the formants is suppressed. This allows the formant 
 movements to he seen more clearly and at the same time retains the high 
 frequency fricative information. 
 
 The formant extracting process essentially takes the frequency 
 analysis of each time slice and finds its major peaks. This involves 
 utilizing a peak-picking routine twice (see figures k and 5). 
 The first pass over the frequency analysis data obtains the minor peaks 
 which represent the various harmonics of the fundamental pitch frequency. 
 The second pass of this data will obtain the peaks which can be considers; 
 to be formant candidates. 
 
 The four largest formant candidates are then selected and analy?a. 
 If the smallest candidate is less than half the size of the next smallest 
 it is eliminated. Any candidates over ^000 cps are also eliminated since 
 it is unlikely that a true formant would appear in that frequency region. 
 
 Once the unlikely candidates are eliminated, the frequency 
 analysis in the region covered by the remaining formants is erased and 
 replaced by the magnitudes of the formants at their corresponding frequent 
 The results of this type of analysis are shown in figure 6. 
 
 It should be noted that this algorithm never determines which 
 formant is the first, which is the second, etc. This is a non-trivial 
 problem since some of the "formants" selected may still occasionally be 
 noise and quite often "real" formants will drop out or merge for several 
 time slices. The only way to effectively determine the actual number 
 
U3 
 
 Initial Spectrum Analysis 
 of Time Slice 
 
 First Pass Selects All 
 Peaks Due to Pitch Harmonics 
 
 Second Pass Selects 
 Potential Formants 
 
 Final Formant Selection 
 
 Figure k Effect of the Peak-Picking Process 
 • on the Spectrum Analysis of a Single Time Slice 
 
uu 
 
 Initial Spectrograph 
 
 Display of Peaks 
 After First Pass 
 
 Display of Peaks 
 After Second Pass 
 
 Final Formant Display 
 
 Figure 5 Effect of the Peak-Picking Process 
 on the Full Spectrographs Analysis of the Word "Dead" 
 
^5 
 
 Speaker ,a 
 "shod" 
 
 Speaker a 
 "vile" 
 
 Speaker b 
 
 h„ •,*" 
 
 said 
 
 Speaker c 
 
 »„„,-,3» 
 
 said 
 
 Speaker h 
 "ted" 
 
 Speaker d 
 "dame" 
 
 Figure 6 Examples of the Formant Extracting Display 
 
k6 
 
 associated with the formant would be to keep a record of the movements 
 over time and on the basis of this record determine which peaks in a 
 given time slice correspond to each formant. 
 
 5.5 Zero— Crossing Display 
 
 The zero-crossing display is a linear time display in which 
 the frequency equivalent to the zero-crossing rate is plotted on the 
 vertical axis and time on the horizontal axis. The speech input is 
 fed to four digital filters, the outputs of which are then analyzed to 
 determine their zero-crossing rates. A single point is plotted for each 
 filter output, the magnitude of the point being proportional to the 
 magnitude of the output of the corresponding filter. 
 
 The frequency regions have been chosen so as to approximate 
 the regions covered by the first, second, and third formant, with the 
 fourth region being a high frequency region for fricatives or other 
 noise-like sounds. 
 
 Examples of this type of zero-crossing display are shown in 
 figure 7. 
 
 5. 6 Zero-Crossing vs. Amplitude Envelope 
 
 This display is a simulation of the display described by 
 Pyron and Williamson [1965]. There are actually two variants, one using 
 the zero-crossing rate of the original speech signal, Z , and the other 
 using the zero-crossing rate of the derivative of the speech signal Z„. 
 (This latter signal can also be thought of as the "zero slope" or 
 maximum-minimum rate). One of these two signals is plotted against the 
 amplitude envelope of the speech signal to produce an x-y type speech 
 display. 
 
 A block diagram showing the production of these two display 
 variants is shown in figure 8. Note that in producing the y input, 
 
hi 
 
 Speaker a 
 "shod" 
 
 Speaker a 
 "vile" 
 
 Speaker b 
 
 ii„„,-^tt 
 
 said 
 
 Speaker c 
 "said" 
 
 Speaker "b 
 "ted" 
 
 Speaker d 
 "dame" 
 
 Figure 7 Examples of the Zero-Crossing Display 
 
U8 
 
 x 
 
 n 2 O 
 
 
 
 
 
 
 co -• 
 
 
 
 
 
 
 Q 
 
 
 
 
 
 i 
 
 X 
 CL 
 
 
 
 
 Q. 
 
 >■ 
 
 
 
 
 z 
 > 
 
 3 s 
 
 Q o 
 
 
 
 
 
 
 
 
 
 
 
 >- cc 
 
 
 O 
 
 
 
 
 X 
 
 ? cc 
 
 
 
 
 
 1 
 
 
 X UJ 
 
 
 
 
 Jl 
 
 (- H- 
 
 
 
 
 3' 
 
 
 O -1 
 
 
 
 
 0. 
 
 
 o E 
 
 
 
 
 z 
 
 
 s 
 
 
 
 
 ™~ 
 
 
 CO 
 
 
 
 
 X 
 
 
 i 
 
 i 
 
 
 2 
 
 
 
 
 
 
 co cr 
 
 
 
 
 o 
 
 CO o 
 
 
 
 
 ? o: 
 
 X UJ 
 
 ERO-CRO 
 
 RATE 
 
 EXTRACT 
 
 
 
 
 
 
 
 CO 
 
 N 
 
 
 
 
 
 1 
 
 I 
 
 , 
 
 , 
 
 CO 
 
 
 
 
 
 
 
 UJ 
 
 
 
 
 
 
 
 > 
 
 
 
 co>_ 
 
 
 
 
 Q q: 
 
 
 2 ^ ° 
 
 -> o (- 
 
 o 
 
 %* 
 
 
 CC 
 UJ 
 
 O _J 
 U. 
 
 MINIM 
 HRESH 
 DETEC 
 
 z 
 
 •la 
 
 CO 
 
 
 
 h- 
 
 
 
 
 
 i 
 
 
 
 : 
 
 
 
 
 
 
 
 
 ! q= 
 
 
 1 o 
 
 >- 
 
 
 
 l±J 
 
 1 H- 
 
 _1 
 
 
 
 > cc 
 
 < 
 
 2 
 
 
 
 < UJ 
 
 h- 
 
 O 
 
 
 
 $ d 
 
 2 
 1 UJ 
 
 a: 
 
 UJ 
 
 1 u. 
 u. 
 
 CM 
 
 M 
 
 a: 
 
 o 
 1 1 
 
 
 
 is 
 
 L ° 
 
 
 
 
 
 
 i 
 
 
 
 X 
 
 > 
 
 
 
 
 
 
 H 
 
 ft 
 co 
 
 •H 
 
 Q 
 
 cu 
 ft 
 O 
 H 
 (U 
 !> 
 PI 
 W 
 
 CU 
 -d 
 
 -p 
 
 •H 
 H 
 
 CM 
 
 ISJ 
 
 tS3 
 O 
 
 U 
 taD 
 
 aJ 
 
 •H 
 
 Q 
 
 u 
 
 o 
 
 H 
 
 pq 
 oo 
 
 <u 
 
 •H 
 
 o 
 
 UJ ^ 
 CO CO 
 
U9 
 
 i.e., the Z or Z signal, the differentiator is omitted if a Z signal 
 is to be used. The threshold detector is used as a blocking device to 
 inhibit the display when no signal is present. 
 
 The display itself has been adjusted so that the zero-crossing 
 rate axis varies from to 6000 zero-crossings per second (equivalent to 
 to 3000 cps.) as was described in Pyron and Williamson [1965]. However, 
 there is no 3KC upper limit bandlimiting as was true in their case. Thus 
 occasionally zero-crossing rates greater than a 3KC equivalent may be 
 encountered. In this case the display routine truncates the display in 
 the vertical direction. The horizontal axis has been adjusted so that 
 the maximum amplitude coming in on the A to D converter will cause a full- 
 scale deflection. 
 
 Examples of the Z-. and z ? vs. amplitude envelope displays are 
 given in Figures 9 and 10, respectively. 
 
50 
 
 Speaker a 
 
 "shod" 
 
 Speaker a 
 "vile" 
 
 Speaker b 
 
 "said" 
 
 ifltM 
 
 Speaker c 
 "said" 
 
 Speaker b 
 
 Speaker d 
 "dame" 
 
 Figure 9 
 
 Examples of the Z vs. Amplitude Envelope Display 
 
51 
 
 Speaker a 
 "shod" 
 
 HPfRg 
 
 
 Hi -w0F* 
 
 
 Mv VF'^'Mn 
 
 ;>•' '■'^B 
 
 Speaker a 
 "vile" 
 
 Speaker b 
 "said" 
 
 Speaker b 
 "ted" 
 
 Speaker c 
 "said" 
 
 Speaker d 
 "dame" 
 
 Figure 10 Examples of the Z g vs. Amplitude Envelope 
 
 Display 
 
Chapter 6 
 SPEECH DISPLAY SIMULATION SYSTEM 
 
 The Speech Display Simulation can be divided into four main 
 areas: the common data base, the command processor, the speech display 
 routines, and the various subprocessing routines. 
 6.1 The Common Data Base 
 
 The common data base consists of the input speech data buffer, 
 BUFF, the output display data buffer, FINT, the CRT display command 
 buffers, ISCOPE and ISC0P1, and all of the constants and variables 
 used to control these buffers. These buffers and variables are all 
 kept in COMMON storage. The problem of keeping the COMMON declaration 
 in each subroutine identical is handled by means of the CSL FORTRAN 
 title feature. This extension of the FORTRAN language allows the pro- 
 grammer to specify FORTRAN statements which will then appear in every 
 program in which the statement TITLE* appears. Any type of valid FORTRAN 
 statement can be put in the title and thus the whole common data base 
 need only be written down once. 
 
 The common data base has several key features. Since the 
 CDC 160U was not fast enough to process speech input in real time, it 
 was necessary to use digital tape for storing the input speech data. 
 As a result, it became unnecessary to provide a full-sized buffer to 
 contain a complete speech utterance. Instead, the floating point 
 buffer, BUFF, is used to contain only that portion of the data which 
 is of current interest. 
 
 As can be seen in figure n 5 there are two corresponding 
 pointers for the data tape and the buffer, BUFF. ISAMP is the main 
 data pointer and selects the initial sample of a set of data points 
 
 52 
 
DATA TAPE (packed integer format) 
 
 53 
 
 a, 
 
 a 
 
 i+l 
 
 a 
 
 i+2 
 
 ISAMP 
 
 BUFFER (unpacked floating point format): 
 
 ISAMPB 
 
 Example: ISAMP 
 ISAMPB 
 
 5U627 
 1627 
 
 1 
 
 
 
 
 
 
 
 ' 
 
 
 
 
 t*l 
 
 a i+i 
 
 a i+2 
 
 
 
 
 
 1 
 
 1 
 
 
 
 
 
 Figure -11 Relationship Between ISAMP and ISAMPB 
 
5h 
 
 from the complete set of data (consisting of many speech utterances) 
 on the data tape. Its value may range up to around 900000, since this 
 is the approximate number of packed sample points which can be written 
 on a single tape. ISAMPB corresponds to ISAMP in that it points to the 
 same data as ISAMP but it refers to the data as it happens to be current] 
 loaded in BUFF. Thus ISAMPB only varies from to the maximum length 
 of BUFF (currently 3000 words). 
 
 The display generating routines are free to move ISAMP up and 
 down the data tape whenever they wish. Before they utilize this new 
 data position, however, they must call the subroutine ADJUS2. This 
 subroutine checks BUFF, and if the data corresponding to the new 
 position of ISAMP is not currently in BUFF, it moves the tape forward 
 or backward until it can load BUFF with the proper data and converts 
 it to floating point. Once BUFF is made to contain the desired data, 
 ADJUS2 sets ISAMPB so that it can be used as an index for BUFF to obtain 
 the desired data. It is this pointer that the speech processing pro- 
 grams use to obtain the speech data. 
 
 The second feature of the common data base involves the FINT 
 array. This array is basically a two-dimensional array containing 
 intensity values with its dimensions corresponding to frequency vs. 
 time. However, it was felt that it would be much more convenient to 
 be able to vary the relative maximum sizes of these two dimensions even 
 while the total length of the array remains fixed. This is especially 
 nice for short speech samples in which it is desired to have a spectro- 
 graphic analysis with a very small increment between frequencies, since 
 in this case the maximum index for the frequency dimension must be 
 increased. Unfortunately FORTRAN has no provision for dynamically 
 
55 
 
 assigning array dimensions. Therefore it was decided to require each 
 program using FINT to calculate its own subscripts using a frequency 
 maximum index, IFMAX, which could be dynamically chosen by the operator. 
 At first this seemed like a lot of extra work but the technique is 
 relatively straightforward and in many cases it resulted in a consider- 
 able increase in speed due to the lamentably inefficient calculations 
 used by CSL FORTRAN to calculate subscripts. This was especially true 
 in loops since the compiler makes no optimizing attempts. 
 6.2 The Command Processor 
 
 The command processor is the heart of the interactive communica- 
 tion with the system. It gives the operator the ability to change the 
 values of the system constants and variables and to call the various 
 display routines. In addition, he can dump out the contents of the 
 various arrays and variables. The command processor includes the main 
 program and the subroutines directly called by it, namely INPTCM, which 
 reads each command with its parameters and the various command identifying 
 subroutines , which determine the command and perform the requested 
 operations. At the present time, INPTCM accepts only fixed format 
 commands. However, it is hoped that it will eventually be possible 
 to expand it to a free format subroutine. 
 
 The command identifying operations have been kept as general 
 as possible. The commands are grouped together according to function 
 into subroutines. Each subroutine has the task of identifying those 
 commands associated with it and then executing them. Since the sub- 
 routines .are independent of c.ie another it is relatively easy to 
 expand the command set simply by adding commands to the relevant sub- 
 routine or writing a completely new subroutine and adding a call to it 
 in the main program. 
 
56 
 
 The conventions for intercommunication are relatively simple 
 and yet allow a high degree of flexibility. Each subroutine accepts 
 as parameters a character variable containing the command and as many 
 of the input parameters read in by INPTCM as may be necessary. If the 
 subroutine determines that the command is not one for which it is 
 responsible, it simply returns. If the command is one of the subset 
 of commands which it can execute, it performs the required operations. 
 
 Then before returning, it sets the command variable to zero to indicate 
 to the main program that the command was executed. Thus after calling 
 all of the command identifying subroutines, the main program merely needs 
 to check the command variable for zero to see if it was executed. If 
 it is not, then the main program types out a message saying that the 
 command was not recognized. 
 
 Note that this technique presents a wealth of opportunities. 
 For example, a command identifying program, as part of its command 
 execution step, could load the command variable with a new command 
 instead of loading it with zero. This command could then be executed 
 by some subsequent command identifying program. This in fact has been 
 done in the present system. To extend the idea even more, the command 
 variable could be generalized to a push down stack. Then you could 
 have complex commands which actually represent a series of simpler 
 commands. The execution of the complex command would consist of 
 expanding it into the simpler series of commands and pushing these into 
 the stack. The main program would pop the stack each time a command 
 was completed and then repeat the identification and execution process 
 for the newly exposed command at the top of the stack. The main program 
 would only return when the stack was empty. The key point to note (and 
 the one which illustrates the general philosophy of the system) is that 
 
57 
 
 this stacking process could be added without modifying the programs 
 which already exist. 
 
 Some of the commands which can be executed by the system are 
 given in Table 2. In addition to being able to run the various 
 display programs and diagnostic routines and to manipulate the data 
 tapes , the command system allows the operator to change many of the 
 system variables. This allows him to easily modify the various 
 displays. It also causes a certain number of problems due to the 
 manner in which some of the system variables and constants interact. 
 An example of this problem occurs in the spectrographic display, where 
 the number of samples to be processed per time slice fixes the interval 
 between frequency coefficients and vice versa. 
 
 The solution to this problem was to allow the user to set 
 certain parameters independently and then have the system calculate 
 the effect of these choices on the other dependent parameters and 
 print them out (this operation is performed by the FINI subroutine). 
 Thus, for example, the operator can choose the desired number of data 
 samples he wants processed per time slice in the spectrographic display 
 and the system will respond by indicating the frequency increment 
 between coefficients and the total frequency range which will be dis- 
 played given the current value of IFREQ. 
 6.3 The Speech Display Routines 
 
 The speech display routines consist of the programs used to 
 simulate the various speech displays. These programs manipulate the 
 common data base using the various subprocessing routines to produce 
 the displays desired. 
 
Command 
 
 Subroutine Which 
 Executes Command 
 
 58 
 Command Operation 
 
 BEGN 
 
 TAPCOM 
 
 Rewind data tape & initialize system 
 
 BEGN 
 
 TAPCOM 
 
 
 BUFF 
 
 DIAGNG 
 
 
 C 
 
 DIAGNG 
 
 
 COPY 
 
 DATGCL 
 
 
 DISP 
 
 DIAGNG 
 
 
 F 
 
 TAPCOM 
 
 
 FINIS 
 
 PROSCL & 
 
 DATGCL 
 
 FIND 
 
 TAPCOM 
 
 
 FORME 
 
 PROSCL 
 
 
 FOWD 
 
 TAPCOM 
 
 
 HEADT 
 
 TAPCOM 
 
 
 HIEMP 
 
 PROSCL 
 
 
 IN ITT 
 
 PROSCL & 
 
 DATGCL 
 
 INTAP 
 
 DIAGNG 
 
 
 IWIDE 
 
 TAPCOM 
 
 
 LOCA 
 
 TAPCOM 
 
 
 MOVE 
 
 TAPCOM 
 
 
 NORMF 
 
 PROSCL 
 
 
 OBTAI 
 
 DATGCL 
 
 
 PHOTO 
 
 PROSCL & 
 
 DATGCL 
 
 PYRON 
 
 PROSCL 
 
 
 READF 
 
 DIAGNG 
 
 
 REWIN 
 
 DIAGNG 
 
 
 SAVEF 
 
 DIAGNG 
 
 
 SPDIS 
 
 PROSCL 
 
 
 SPECT 
 
 PROSCL 
 
 
 STAND 
 
 PROSCL 
 
 
 THRSP 
 
 DATGCL 
 
 
 WHATN 
 
 PROSCL & 
 
 DATGCL 
 
 ZEROC 
 
 PROSCL 
 
 
 Print out buffer contents 
 
 Next input will be a comment 
 
 Copy data tape 
 
 Display buffer contents on CRT 
 
 Short form of FOWD = 1000 
 
 Calculate dependent variables & turn off 
 
 Search data tape for specified speech wor 
 
 Call FORMEX display routine 
 
 Move data tape forward NVAL samples 
 
 Process header block 
 
 Add high frequency emphasis to display da, 
 
 Initialize system variables 
 
 Assign input command medium 
 
 Assign window size for data tape display 
 
 Print out value of data pointer 
 
 Move data pointer to NVAL 
 
 Normalize display data 
 
 Use A to D convertor to obtain speech dat 
 
 Take picture of last display 
 
 Call PYRON display routine 
 
 Read out display data stored on tape unit 
 
 Rewind tape unit NVAL 
 
 Write display data on to tape unit 3 
 
 Display the display data array on the CRT 
 
 Call SPECTO display routine 
 
 Produce a standard Spectrograph display 
 
 Call THRSPIC data processing routine 
 
 Call WHATNOW subroutine 
 
 Call ZEROC display routine. 
 
 Table 2 Commands Executed by Speech System 
 
59 
 
 There are two "basic formats for the output data. The three 
 dimensional linear time displays are generally represented as a two- 
 dimensional FORTRAN array (stored in FINT) with each element containing 
 a quantity representing the intensity of the corresponding point on the 
 display. The display routines can then normalize the data (performing 
 such operations as high frequency emphasis, if desired), interpolate 
 between data points and, produce a smoothly varying, multi-intensity 
 level display. 
 
 The x-y type of displays are represented as two arrays of 
 the corresponding x and y coordinates of successive points in the dis- 
 play. These points can then be displayed as a continuous line using 
 other system display routines. In addition other varients can be 
 produced. In particular, a trivial modification of the above display 
 program allows a single variable array to be plotted against time 
 (i.e. successive values of the array are plotted vs. equidistant 
 intervals on the x axis). 
 6.k The Subprocessing Routines 
 
 The subprocessing routines consist of the programs which are 
 used to perform various operations and transformations of data. Each 
 routine performs a single type of operation and might be used in the 
 construction of several different displays. 
 
 In order to insure their flexibility of use, the subprocessing 
 routines have all been programmed to conform to a certain general form. 
 In particular, each program receives as its input a data array and a 
 variable indicating the number of points to be processed. The output 
 may or may not be an array. If it is an array and if the output array 
 contains the same number of points as the input, the program is written 
 so that the same array can function as both input and output, if desired. 
 
6o 
 
 If the number of points in the output array is different from the 
 number of input points, this number is specified as an output parameter. 
 
 In general, all intermediate data arrays used in the processing 
 of data in the subprocessing routines are specified as parameters. This 
 allows the calling programs to have complete control over the storage 
 allocation of arrays and results in a considerable savings in space. 
 
 In order to avoid the variety of problems created by passing 
 subroutine parameters through COMMON, this practice was generally not 
 used. By passing all of the parameters explicitly, the routines are 
 easier to understand and have many fewer mysterious side effects. There 
 are two exceptions to this rule, however. One is that certain system 
 constants were allowed to be obtained directly from COMMON, e.g. the 
 sampling frequency, etc. In general, the variables which are passed in 
 this manner are those whose use and meaning are unlikely to change as 
 the system matures. This lowers the probability of having to rewrite 
 the subprocessing routine later on. The second exception involves short 
 subroutines which are used very often, i.e. in "tight loops". In such 
 cases the overhead involved in handling explicit parameters becomes 
 excessive so that passage through the COMMON area becomes necessary. 
 6. 5 Basic System Principles 
 
 As the Speech Display Simulation System developed, certain key 
 principles were developed as follows: 
 
 l) The common data base, command processor and speech dis- 
 play routines should be basically machine independent. This means that 
 they should be written in standard FORTRAN as much as possible and any use 
 of CSL FORTRAN extensions should be fully documented by means of comment 
 statements in the code itself. 
 
6l 
 
 2) The subprocessing routines may be written in machine language 
 or in a combination of FORTRAN and machine language as is allowed in the CSL 
 FORTRAN system. However, this should only be done if a significant speedup 
 in time or savings in space results or if it is necessary to perform some 
 special function, such as communicating with the CRT display unit. In 
 either case all occurrences of machine code should be explained both in the 
 overall sense and at the detailed instruction level by comments within the 
 program. 
 
 3) Test programs used to check out the various subprocessing 
 routines are not normally to be loaded with the rest of the system. They 
 are kept on the library tape, however, so that when needed, they may be 
 easily loaded by making a call request to the CSL Operating System. These 
 programs should be well commented with exact instructions on their use since 
 it is easy to forget their operation within a matter of weeks if they are 
 not used regularly. 
 
 The complete descriptions of the various programs used in the 
 Speech Display system are given in Nordmann[l97l] along with the program 
 listings, test programs and sample outputs. 
 
Chapter 7 
 RESULTS 
 
 The basic simulation system has worked quite well and proved 
 quite adaptable as time went by. The major problem with the system at the 
 present time is the amount of inconvenience involved in producing a digital 
 data tape which can be used by the processing routines. Although the re- 
 cording and playback through the A to D convertor is easy enough, the de- 
 cision about what to save and put on the permanent data tape must be done 
 on an individual basis by the operator. There are routines which can be 
 used to assist in this operation, such as THRSPIC which will print out, for 
 each block on a tape, the number of samples above any particular threshold 
 value chosen by the operator. However, the basic decision as to where the 
 word starts and ends must be made by the operator. In a real time system 
 it would be possible to get around such a problem by using a push button 
 to indicate when you want the computer to "listen". At any rate, at the 
 present time, this task is somewhat tedious. Once it is accomplished, how- 
 ever, the production of the displays is fairly simple. 
 
 The testing of the displays turned out to present quite a few 
 problems, mostly revolving around the expense of a. really comprehensive tes 
 ing procedure. In the end it was necessary to restrict the amount of test- 
 ting done with the result that the tests which were performed cannot in any 
 way be considered definitive. However, several procedural variations were 
 tested and certain generalizations can be made about the restricted tests 
 which were perfromed. 
 
 In the end it was only possible to get two subjects who were 
 able to complete a full set of tests and even these subjects were not able 
 to run a full series on every display type. In addition several other 
 
 62 
 
63 
 
 subjects completed various parts of the test series for specific display- 
 types. As a result it is impossible to make any statistically significant 
 generalizations about the results and no type of statistical analysis was 
 eyen attempted. It is hoped however that the results will prove useful in 
 indicating the types of tests which- might be useful In the future. 
 
 7.1 Recordings 
 
 The first area which became restricted was the recorded data it- 
 self. In order to minimize the number of utterances to be processed, the 
 test vocabulary was restricted to the ^0 words listed in Table 3. 
 The words were chosen so as to give a distribution over the full range of 
 vowel sounds and at the same time allow maximum testing between words differ- 
 ing by only a single phoneme. Four speakers were used, three female and 
 one male, to produce a total of 160 utterances. It was also intended to use 
 a set of recordings of the Modified Rhyme Test (see Kreul, et. al [1968] or 
 Beyer, et. al. [1969]) produced by the Stanford Research Institute and 
 available from K-G Recording Service, U311 Miranda Ave., Palo Alto, California. 
 These recordings were originally produced to be used in speech discrimination 
 tests but they were felt to be appropriate for the present purpose. Unfor- 
 tunately a variety of equipment difficulties, some of which were never solved, 
 prevented their conversion to digital tape. The result was that the number 
 of utterances available for the second type of test was not really large 
 enough. 
 
 The recordings of the UO word list were produced in a quiet room 
 using untrained friends of the author as speakers. The equipment used con- 
 sisted of an Allied M3310 cardioid microphone attached to one channel of an 
 Allied T-1070 stereo tape recorder. The use of untrained speakers produced 
 one rather severe problem which was not discovered until several trial test 
 runs had been performed, namely the words were not all enunciated clearly. 
 
6k 
 
 shin 
 
 beet 
 
 dead 
 
 hag 
 sod 
 
 four 
 
 moh 
 
 guff 
 
 ted 
 
 sore 
 
 thin 
 
 shod 
 
 peat 
 
 noh 
 
 zed 
 
 hang 
 
 cuff 
 
 June 
 
 thor 
 
 cage 
 
 lynn 
 
 pang 
 
 vile 
 
 wage 
 
 said 
 
 loon 
 
 chuck 
 
 gin 
 
 dame 
 
 file 
 
 stuck 
 
 ned 
 
 lip 
 
 Wig 
 
 hose 
 
 tame 
 
 rip 
 
 mile 
 
 rang 
 
 Tahle 3 
 
 List of Recorded Words 
 
65 
 
 This caused confusions "between certain particular utterances by certain 
 speakers independent of the type of display used since the recordings them- 
 selves were ambiguous. The effect of this problem -will be discussed further 
 in the subsections concerning the actual test results. 
 7.2 Data From'the First Test 
 
 As described in Section k, the first test was intended to help 
 determine if it was possible to extract the necessary information from a 
 given type of display to identify different words consistantly. It was 
 also intended to give a measure of the relative efficiency with which the 
 various display types performed this task by measuring the length of time 
 needed to reach a certain proficiency with the display. 
 
 The test items for the first test were selected from the list 
 of kO words which were spoken by the k speakers. Two separate groups of 
 items were used; the first (test la) consisting of the words zed, said, vile, 
 file, dame, and tame and the second (test lb) consisting of the words cuff, 
 guff, mob, knob, shod, sod, ned and ted. The words in the two groups were 
 chosen so as to provide pairs of words which might be easily confused if the 
 displays were not in fact providing the proper cues. Unfortunately with the 
 limited amount of testing which could be done, it was not possible to test 
 for the full range of confusions between all the various phonemes. 
 
 The procedure for the first test involved showing the subject 
 slides of the displays produced by a particular display type and having the 
 subject try to determine which word was being displayed. When the subject 
 responded, he was told whether or not his response was: correct and if not, 
 what word was actually being displayed. Initially the subject was allowed to 
 look for five minutes at a labelled sheet containing pictures of all the 
 slides in the test. Then the complete set was shown to him one at a time for 
 
66 
 
 as many times, as was necessary for it to he learned. During the test the 
 suhject was allowed to use a written list containing the words in the grout 
 heing displayed. 
 
 Measurements were taken of the numher of trial sets necessary 
 to reach the criterion level of response. This level was loosely defined 
 as the point at which the suhject hegan to level off in improvement and 
 started making a more or less consistant set of mistakes. It was more 
 specifically specified as four consecutive trial sets in which the number 
 of responses did not vary by more than 10$. Tables h, 5, and 6 give the 
 learning rates of each subject for the spectrograph! c , zero-crossing, and 
 formant extracting displays, respectively, in terms of the number of trial 
 sets necessary to reach the criterion run and the average percentage correc 
 during the criterion run. 
 
 Confusion matrices were also constructed using the test results. 
 By keeping the effects of the various speakers separate from one another, 
 it was possible to determine effects which might be due to a single speaker 
 alone. Tables 7 through 19 give the confusion matrices for each subject 
 during their criterion runs, arranged in order of the type of display. Each 
 box in each matrix has room for five numbers. The upper and lower left 
 hand corners contain the number of times a particular response was given 
 for display instances of the particular word as it was spoken by speakers 
 a and b respectively. The upper and lower right hand corners contain the 
 number of responses for instances involving speakers c and d respectively. 
 The number in the center position is simply the sum of the numbers in the 
 four corners and represents the total number of times a particular response 
 was given to a display instance representing the particular word irrespectivt 
 of which speaker pronounced it. 
 
67 
 
 
 
 % 
 
 CORRECT 
 
 ' DUR- 
 
 SUBJECT 
 
 SETS TO CRITERION 
 
 ING 
 
 CRITERION RUN 
 
 A 
 
 k 
 
 
 92% ^ 
 
 
 B 
 
 ■ 1 
 
 
 12% 
 
 
 C 
 
 1 
 
 
 65% > Te 
 
 D 
 
 
 
 
 62$ 
 
 
 E 
 
 h 
 
 
 60$ ^ 
 
 
 A 
 
 3 
 
 
 w 1 Te 
 
 B 
 
 3 
 
 
 90$ J 
 
 
 Table k Learning Rates for 
 Spectrograph! c Display 
 
68 
 
 SUBJECT 
 
 SETS TO CRITERION 
 
 ^CORRECT DUR- 
 ING CRITERION RUN 
 
 B 
 
 81$ 
 
 91% 
 
 Test la 
 
 Test lb 
 
 Table 5 Learning Rates for 
 Zero-Crossing Display 
 
69 
 
 SUBJECT 
 
 SETS 
 
 TO 
 
 CRITERION 
 
 % CORRECT DUR- 
 ING CRITERION RUN 
 
 A 
 A 
 
 
 
 h 
 3 
 
 92$ tes 
 93$ tes 
 
 Table 6 Learning Rates for 
 Formant Extracting Display 
 
TO 
 
 id 
 < 
 
 
 
 
 
 ; h 
 
 H 
 
 CvJ _=f 
 H 
 
 UJ 
 
 < 
 
 Q 
 
 
 
 
 
 -4- on 
 
 H 
 -4 _4 
 
 CM 
 
 C\J 
 
 UJ 
 
 _l 
 
 u_ 
 
 
 
 rH 
 
 H 
 
 i ^ 
 
 H 
 J" -4- 
 
 
 
 UJ 
 
 _J 
 > 
 
 
 
 00 -4" 
 H 
 
 
 
 
 < 
 CO 
 
 m 
 
 on 
 
 H 
 
 
 
 
 
 Q 
 UJ 
 N 
 
 J- H 
 
 H 
 -4 J- 
 
 rH 
 
 
 
 
 
 
 Q 
 UJ 
 N 
 
 < 
 CO 
 
 LU 
 
 _l 
 
 > 
 
 UJ 
 
 _J 
 
 u. 
 
 UJ 
 < 
 
 UJ 
 
 < 
 
 1- 
 
 H 
 ft 
 M 
 
 •H 
 
 P 
 
 U 
 •H 
 £ 
 
 ft 
 
 nJ 
 U 
 bO 
 
 
 o 
 ft 
 
 CQ 
 
 0) 
 
 H 
 
 +3 
 CO 
 0) 
 
 EH 
 
 -p 
 
 o 
 <L> 
 
 ■3 
 
 CQ 
 
 h 
 
 
 a 
 
 (3 
 O 
 
 •H 
 P 
 
 
 
 o 
 
 <u 
 
 H 
 
 •a 
 
 EH 
 
71 
 
 LU 
 
 < 
 
 
 ■ 
 
 
 
 VD 
 H H 
 
 CM J" 
 
 on 
 
 H 
 
 -j- on 
 
 LU 
 
 < 
 
 
 
 
 
 O 
 H 
 
 on on 
 
 CM 
 
 on 
 
 H 
 
 LU 
 
 _J 
 
 Ll 
 
 
 
 on 
 
 H -J- 
 H 
 
 
 
 LlI 
 _J 
 
 > 
 
 
 
 On 
 
 on 
 
 on 
 
 
 
 < 
 CO 
 
 H H 
 
 on 
 
 H 
 
 H on 
 
 o 
 H 
 
 CM -=t" 
 
 
 
 
 
 Q 
 LU 
 N 
 
 on on 
 
 on 
 H 
 
 on J- 
 
 on h 
 
 VD 
 CM 
 
 
 
 
 
 
 Q 
 
 LU 
 N 
 
 < 
 
 CO 
 
 LU 
 _l 
 
 > 
 
 LU 
 
 _J 
 
 Lu 
 
 LU 
 < 
 
 LU 
 
 < 
 
 1- 
 
 H 
 
 ft 
 
 M 
 •H 
 
 Q 
 
 o 
 
 •H 
 
 ft 
 
 cd 
 fn 
 
 O 
 ^ 
 
 -P 
 CJ 
 0) 
 
 ft 
 
 CO 
 
 -p 
 
 CD 
 En 
 
 pq 
 
 -p 
 o 
 <d 
 
 CO 
 
 !h 
 
 o 
 
 13 
 
 O 
 •H 
 W 
 
 O 
 O 
 
 OO 
 
 0) 
 H 
 
 ■3 
 
 EH 
 
72 
 
 LU 
 
 < 
 
 1- 
 
 
 
 
 
 CXI 
 
 CM 
 
 H J- 
 
 CM 
 H 
 
 -=i- on 
 
 LlI 
 < 
 
 
 
 
 
 -=t CM 
 
 H 
 
 on 
 
 H 
 
 UJ 
 
 _l 
 
 U_ 
 
 
 
 on 
 
 -4" OJ 
 
 on 
 H 
 
 on j- 
 
 
 
 UJ 
 
 _J 
 
 > 
 
 
 
 ITS 
 
 H J- 
 
 CaJ 
 
 en 
 
 H 
 
 
 
 Q 
 
 < 
 CO 
 
 on h 
 
 H -=t- 
 
 CM 
 
 
 
 
 
 Q 
 UJ 
 N 
 
 h on 
 
 CM 
 
 H 
 
 on 
 
 
 
 
 
 
 Q 
 UJ 
 N 
 
 < 
 CO 
 
 UJ 
 
 _l 
 
 > 
 
 UJ 
 
 _l 
 
 u_ 
 
 UJ 
 < 
 
 UJ 
 
 < 
 
 1- 
 
73 
 
 LlI 
 
 < 
 
 1- 
 
 
 ■ 
 
 
 
 on 
 
 CM 
 
 CM 
 
 on 
 H 
 
 _=t- on 
 
 LLi 
 
 < 
 
 
 
 CM 
 CM 
 
 
 on H 
 
 VD 
 CM O 
 
 CM 
 
 on 
 
 r— i 
 
 UJ 
 
 _J 
 
 Ll. 
 
 
 
 H 
 
 on on 
 
 o 
 H 
 
 H on 
 
 H 
 
 
 UJ 
 
 _J 
 > 
 
 
 
 H 
 
 -4" O 
 
 H H 
 
 on H 
 
 
 
 < 
 CO 
 
 J H 
 VO 
 
 H 
 
 on j- 
 
 CM 
 H 
 
 CAJ OO 
 
 
 
 
 
 Q 
 UJ 
 N 
 
 o on 
 o 
 
 H 
 
 j- on 
 
 H 
 
 CM H 
 
 
 
 
 
 
 Q 
 LU 
 N 
 
 < 
 CO 
 
 UJ 
 
 _l 
 
 > 
 
 Lul 
 
 _l 
 
 Li- 
 
 LU 
 < 
 
 LU 
 
 < 
 
 1- 
 
 cd 
 H 
 
 ft 
 w 
 •H 
 Q 
 
 o 
 
 ■H 
 
 ,£) 
 
 & 
 
 to 
 
 O 
 U 
 
 -P 
 O 
 
 ft 
 
 CQ 
 
 -P 
 
 CD 
 EH 
 
 Q 
 
 -P 
 o 
 CD 
 
 •<~3 
 
 CO 
 
 o 
 
 ■H 
 !m 
 -P 
 
 cd 
 
 o 
 
 •H 
 
 en 
 
 O 
 o 
 
 o 
 
 CD 
 
 H 
 ■§ 
 
 EH 
 
lh 
 
 UJ 
 
 < 
 
 
 H 
 
 
 
 
 H 
 
 -4 on 
 
 LlI 
 
 < 
 
 H 
 
 H 
 
 H 
 
 
 on 
 E— 
 
 -4- O 
 
 H 
 
 UJ 
 
 _l 
 
 
 H 
 
 H H 
 CM 
 
 J- H 
 
 O 
 H 
 
 H -=t 
 
 
 
 UJ 
 
 _J 
 
 > 
 
 
 H 
 
 H on 
 
 OJ 
 H 
 
 J- -3- 
 
 on 
 
 VD 
 
 on 
 
 
 
 < 
 CO 
 
 J- H 
 C\J OJ 
 
 on o 
 on h 
 
 H 
 
 
 
 
 Q 
 
 UJ 
 N 
 
 O OJ 
 CM OJ 
 
 H 
 
 H on 
 
 H 
 
 
 
 
 
 Q 
 UJ 
 N 
 
 < 
 CO 
 
 UJ 
 
 _l 
 
 > 
 
 UJ 
 
 _J 
 
 LL 
 
 UJ 
 < 
 
 UJ 
 
 < 
 
 H 
 ft 
 w 
 
 >H 
 
 P 
 u 
 
 •H 
 ft 
 
 a) 
 U 
 bO 
 O 
 !h 
 •P 
 o 
 <u 
 ft 
 
 CO 
 
 H 
 -P 
 
 0) 
 
 EH 
 
 W 
 
 -P 
 U 
 (0 
 
 •n> 
 
 ■3 
 
 CO 
 
 
 
 =h 
 
 X 
 •H 
 
 s 
 
 d 
 o 
 
 ■H 
 03 
 
 Ch 
 El 
 O 
 
 o 
 
 H 
 
 H 
 
 01 
 H 
 
 -s 
 
 &H 
 
75 
 
 Q 
 
 LU 
 V- 
 
 
 
 
 
 
 
 
 LA LA 
 
 ao 
 H 
 
 la m 
 
 Q 
 UJ 
 
 Z 
 
 H 
 
 H 
 CVJ 
 
 H 
 
 on 
 
 
 
 
 on oo 
 
 LA 
 H 
 
 J- LA 
 
 CM 
 
 CM 
 
 Q 
 O 
 CO 
 
 
 
 
 
 
 LA IA 
 
 O 
 CM 
 
 LA LP 
 
 
 
 Q 
 O 
 X 
 CO 
 
 
 
 
 
 LA LA 
 
 O 
 CM 
 
 LA LA 
 
 
 
 
 CD 
 
 o 
 
 z 
 
 
 
 
 LP, LP 
 
 O 
 CVJ 
 
 LTN LP 
 
 
 
 
 
 CD 
 O 
 
 5 
 
 
 
 CM LTN 
 
 t— 
 
 H 
 
 LP LTN 
 
 
 
 
 H CM 
 H 
 
 
 ll 
 
 O 
 
 OJ CM 
 
 LTN H 
 
 m 
 
 H 
 
 
 
 
 
 H 
 
 
 Ll 
 Ll 
 
 O 
 
 00 CM 
 
 IP* 
 H 
 
 LA LA 
 
 en 
 
 A 
 
 H H 
 
 
 
 
 
 
 
 
 Ll 
 
 u. 
 
 3 
 O 
 
 Ll 
 Ll 
 
 00 
 
 o 
 
 5 
 
 CD 
 O 
 
 Z 
 
 Q 
 O 
 
 X 
 CO 
 
 a 
 o 
 co 
 
 Q 
 
 LU 
 
 Z 
 
 a 
 
 LU 
 H 
 
 H 
 ft 
 CO 
 •H 
 P 
 
 o 
 
 •H 
 ft 
 
 bD 
 O 
 
 -P 
 CJ 
 
 ft 
 
 CO 
 
 H 
 
 -p 
 
 CQ 
 CD 
 
 EH 
 
 -p 
 
 o 
 OJ 
 •r-3 
 
 CQ 
 
 U 
 O 
 
 <H 
 •H 
 
 m 
 p 
 
 cd 
 2 
 
 o 
 
 •H 
 
 ch 
 
 a 
 o 
 
 o 
 
 CM 
 
 H 
 
 ■3 
 
 En 
 
76 
 
 o 
 
 UJ 
 
 H 
 
 H 
 
 
 
 
 
 
 
 on -3- 
 
 H 
 
 Q 
 
 UJ 
 
 z 
 
 
 
 
 H 
 
 H 
 
 
 
 CM _=!- 
 
 H 
 
 H 
 
 c\j on 
 
 H 
 
 Q 
 O 
 CO 
 
 
 
 
 
 
 H 
 
 
 
 Q 
 O 
 X 
 CO 
 
 
 
 
 
 H 
 
 
 
 
 m 
 o 
 
 Z 
 
 
 
 H 
 
 -4" X 
 
 O 
 H 
 
 on on 
 
 
 
 
 
 CD 
 
 O 
 
 5 
 
 
 
 on j- 
 
 H 
 
 -3- J- 
 
 H 
 
 
 
 CM 
 
 CM H 
 
 
 Ll 
 Ll 
 
 O 
 
 rH 
 
 OJ 
 H 
 
 H 
 on j- 
 
 
 
 
 
 
 
 Ll 
 
 ID 
 O 
 
 on on 
 on 
 
 H 
 
 on .j- 
 
 H 
 H 
 
 
 
 
 
 
 
 
 l_L 
 Ll 
 
 O 
 
 Ll 
 Ll 
 
 CD 
 
 00 
 
 o 
 
 2 
 
 in 
 o 
 
 Z 
 
 Q 
 O 
 X 
 CO 
 
 Q 
 O 
 CO 
 
 o 
 
 UJ 
 
 Z 
 
 Q 
 
 UJ 
 1- 
 
 H 
 
 ft 
 en 
 
 •H 
 
 Q 
 
 ft 
 cd 
 Sh 
 
 O 
 
 u 
 ■p 
 
 o 
 
 CO 
 
 ft 
 
 CO 
 
 •p 
 w 
 
 OJ 
 
 Eh 
 
 pq 
 
 -p 
 o 
 
 OJ 
 
 CQ 
 
 in 
 O 
 <H 
 
 X 
 
 •H 
 
 s 
 
 c 
 
 o 
 
 •H 
 
 p 
 
 o 
 u 
 
 on 
 
 H 
 
 ■s 
 
 EH 
 
77 
 
 UJ 
 
 < 
 
 
 i 
 
 
 
 on 
 en 
 
 lt\ on 
 
 oo 
 H 
 
 LU 
 
 < 
 
 
 
 
 
 LTN LT\ 
 
 H 
 CM LTN 
 
 CM 
 CM 
 
 UJ 
 
 _l 
 
 U. 
 
 
 
 CM 
 
 VO 
 CM CM 
 
 lt\ on 
 
 H 
 
 on on 
 
 
 
 IxJ 
 
 _J 
 
 > 
 
 
 
 on ir\ 
 
 H 
 
 en en 
 
 . CM 
 
 vo 
 
 CM CVJ 
 
 
 
 < 
 CO 
 
 H CM 
 
 -=J- 
 
 H 
 
 LT\ LfS 
 
 o\ ; 
 H 
 
 J" LT\ 
 
 
 
 
 
 Q 
 UJ 
 N 
 
 -3- m 
 vo 
 
 H 
 
 H 
 H 
 
 
 
 
 
 
 o 
 
 UJ 
 N 
 
 < 
 CO 
 
 UJ 
 
 _l 
 
 > 
 
 UJ 
 Li- 
 
 UJ 
 
 < 
 
 UJ 
 
 < 
 
 1- 
 
 H 
 
 Ph 
 CO 
 •H 
 O 
 
 bO 
 
 a 
 
 •H 
 co 
 CO 
 
 o 
 
 ?H 
 
 CJ 
 
 I 
 
 o 
 
 CD 
 
 CO 
 
 -P 
 CO 
 0) 
 
 EH 
 
 ■p 
 
 CJ 
 
 cu 
 
 ■f-3 
 
 w 
 
 U 
 O 
 
 <H 
 
 •H 
 
 13 
 
 s 
 
 o 
 
 •H 
 
 co 
 
 C 
 O 
 O 
 
 H 
 
 ■3 
 
 EH 
 
78 
 
 LlI 
 
 :> 
 < 
 
 i- 
 
 
 
 
 
 H 
 
 H 
 
 H -3- 
 
 O 
 H 
 
 J" H 
 
 UJ 
 < 
 
 
 
 H 
 
 
 j- on 
 H 
 
 on 
 
 on 
 
 LU 
 
 
 
 CO 
 
 on 
 
 on cvi 
 o 
 
 H 
 
 on c\j 
 
 
 
 UJ 
 
 _J 
 > 
 
 
 
 c\j on 
 o 
 
 H 
 
 H -=1- 
 
 H CM 
 H <M 
 
 
 
 < 
 CO 
 
 rH H 
 CM 
 
 on 
 H on 
 
 
 
 
 
 Q 
 UJ 
 N 
 
 on on 
 H 
 
 -J- H 
 
 On 
 
 on h 
 
 
 
 
 
 
 Q 
 UJ 
 N 
 
 < 
 CO 
 
 UJ 
 
 _J 
 
 > 
 
 UJ 
 
 _J 
 
 u_ 
 
 UJ 
 < 
 
 UJ 
 < 
 
79 
 
 Q 
 
 UJ 
 
 
 H 
 
 
 
 
 
 
 LA 
 
 H 
 
 LA J" 
 
 Q 
 UJ 
 
 Z 
 
 
 H 
 
 H , 
 
 H 
 
 
 
 LA _4" 
 
 O 
 CM 
 
 LA 
 LT\ 
 
 LA 
 
 LA 
 
 Q 
 O 
 CO 
 
 
 
 
 
 
 LA LA 
 
 O 
 
 CM 
 
 LA LA 
 
 
 
 Q 
 
 O 
 
 X 
 
 
 
 
 
 LA LA 
 O 
 
 c\j 
 
 LA LA 
 
 
 H 
 
 H 
 
 CD 
 O 
 
 z 
 
 H 
 
 
 H 
 CM 
 
 H 
 
 LA LA 
 
 
 
 
 
 CD 
 O 
 
 5 
 
 
 
 J" -3" 
 
 H 
 la ^t 
 
 en 
 
 
 
 
 
 u_ 
 U_ 
 
 3 
 
 o 
 
 
 LA j- 
 
 co 
 H 
 
 -3" LA 
 
 
 
 
 
 
 
 U_ 
 Ll 
 
 3 
 O 
 
 LA _3" 
 
 o\ 
 
 H 
 
 LT\ LA 
 
 
 
 
 
 
 
 
 
 U. 
 Ll 
 
 3 
 O 
 
 Ll 
 Ll 
 
 (5 
 
 CD 
 O 
 
 2 
 
 CD 
 O 
 
 Z 
 
 Q 
 O 
 X 
 CD 
 
 Q 
 O 
 
 en 
 
 Q 
 UJ 
 
 Z' 
 
 Q 
 UJ 
 
 ft 
 •H 
 
 Q 
 bO 
 
 •H 
 
 in 
 m 
 O 
 !h 
 o 
 l 
 o 
 
 (D 
 ISl 
 
 H 
 
 -P 
 w 
 
 (D 
 
 Eh 
 
 <! 
 
 -P 
 o 
 
 ^> 
 
 CQ 
 
 ?h 
 O 
 
 ><! 
 
 •H 
 
 ■8 
 
 O 
 
 •H 
 
 S3 
 O 
 O 
 
 r-\ 
 CD 
 
 H 
 
 ■3 
 
 Eh 
 
80 
 
 o 
 
 
 
 
 
 
 
 
 00 J- 
 H 
 
 O 
 UJ 
 
 
 
 
 
 
 
 VD 
 H 
 
 H 
 
 H 
 
 Q 
 O 
 
 co 
 
 
 
 
 
 
 VD 
 H 
 
 
 
 o 
 o 
 
 X 
 CO 
 
 
 
 
 
 -3- _=f 
 
 VD 
 H 
 
 
 
 
 CD 
 O 
 
 
 
 H 
 
 -J- X 
 CM 
 
 H 
 
 -=1- _=f 
 
 
 
 
 
 CO 
 
 o 
 
 2 
 
 
 
 j- en 
 
 LTN 
 
 H 
 
 
 
 
 
 
 U_ 
 
 Li. 
 
 Z> 
 O 
 
 H 
 H 
 
 LTN 
 
 H 
 
 J- _j- 
 
 
 
 
 
 
 
 U_ 
 
 Li- 
 ID 
 
 o 
 
 J- on 
 
 LTN 
 
 H 
 
 H 
 
 
 
 
 
 
 
 
 U_ 
 Lu 
 
 Z> 
 O 
 
 U_ 
 U_ 
 
 Z> 
 
 co 
 
 o 
 
 CD 
 
 O 
 
 Q 
 O 
 
 I 
 CO 
 
 Q 
 O 
 CO 
 
 Q 
 UJ 
 
 O 
 UJ 
 h- 
 
 ■p 
 o 
 
 <U 
 •o 
 
 CO 
 
 ?H 
 
 
 
 tH 
 
 X 
 
 •H 
 
 ■P 
 CI) 
 
 fi 
 o 
 
 •H 
 
 co 
 
 a 
 o 
 u 
 
 H 
 
 m 
 
 H 
 
 ■s 
 
81 
 
 UJ 
 
 < 
 
 1- 
 
 
 ■ 
 
 
 
 ' 
 
 UA 
 
 H 
 UA _=!" 
 
 UJ 
 
 < 
 
 
 
 
 
 UA UA 
 
 O 
 CM 
 
 UA UA 
 
 UA 
 
 H 
 
 UJ 
 -J 
 
 u. 
 
 
 
 H 
 
 J- UA 
 
 On 
 
 H 
 
 UA UA 
 
 
 
 LU 
 _J 
 
 > 
 
 
 
 J- UA 
 
 o\ 
 
 H 
 UA LIA 
 
 H 
 
 
 
 < 
 
 CO 
 
 H 
 
 UA ■ UA 
 
 OA 
 H 
 
 -=t UA 
 
 
 
 
 
 Q 
 
 UJ 
 N 
 
 O 
 CM 
 
 LTN UA 
 
 H 
 
 
 
 
 
 
 Q 
 UJ 
 N 
 
 Q 
 
 < 
 CO 
 
 UJ 
 
 _l 
 > 
 
 UJ 
 
 _J 
 
 u_ 
 
 UJ 
 
 < 
 
 UJ 
 < 
 
 H 
 
 ft 
 
 CD 
 •H 
 P 
 
 •H 
 -P 
 O 
 
 w 
 -p 
 
 s 
 
 O 
 
 H 
 -P 
 
 <u 
 
 EH 
 
 •p 
 o 
 
 0) 
 
 CQ 
 
 !h 
 O 
 
 <u 
 
 X 
 ■H 
 
 ^H 
 P 
 
 ctf 
 S 
 
 a 
 o 
 
 •H 
 
 co 
 
 o 
 o 
 
 CO 
 
 H 
 
 
 H 
 
 ■3 
 
 EH 
 
82 
 
 Q 
 
 UJ 
 
 h- 
 
 
 
 
 
 
 
 
 LA LT 
 
 O 
 04 
 
 LA LT 
 
 Q 
 LU 
 
 2 
 
 H 
 
 
 
 
 
 
 LT\ LA 
 
 On 
 H 
 
 LA 
 
 
 Q 
 O 
 CO 
 
 
 
 
 
 
 On 
 
 0^ 
 
 
 
 Q 
 O 
 X 
 CO 
 
 
 
 
 
 LTN LA 
 
 O 
 
 CM 
 
 LA LTN 
 
 H o" 
 
 vo 
 
 
 
 CO 
 
 o 
 
 
 
 
 on 
 on 
 
 H 
 
 LTN LT> 
 
 
 
 
 
 CO 
 
 o 
 
 
 
 LA LT\ 
 O 
 
 0J 
 LT\ LA 
 
 ro 
 
 
 
 H 
 
 
 U- 
 
 u. 
 
 ID 
 CD 
 
 H 
 
 la la 
 
 o 
 (M 
 
 LTN LA 
 
 
 
 
 
 
 
 LL. 
 O 
 
 LA 00 
 00 
 
 H 
 
 LA LA 
 
 
 
 
 
 
 
 
 
 u_ 
 
 LL 
 
 Z> 
 O 
 
 U_ 
 
 u_ 
 
 Z> 
 CD 
 
 CO 
 
 o 
 
 5 
 
 CO 
 
 o 
 
 z 
 
 Q 
 O 
 
 X 
 CO 
 
 Q 
 O 
 CO 
 
 Q 
 UJ 
 Z 
 
 o 
 
 UJ 
 
 CO 
 
 H 
 ft 
 to 
 
 •H 
 P 
 
 bO 
 
 •H 
 -P 
 O 
 
 a 
 
 -p 
 x 
 
 S 
 o 
 
 H 
 
 -P 
 CO 
 
 <D 
 
 En 
 
 -P 
 U 
 0> 
 
 "i-3 
 ■3 
 
 CO 
 Ih 
 
 o 
 
 Ch 
 
 X 
 ■H 
 
 s 
 
 a 
 o 
 
 •H 
 
 to 
 
 a 
 
 o 
 o 
 
 On 
 
 CD 
 H 
 
 EH 
 
83 
 
 As might be expected, the small number of subjects has caused 
 a great deal of confounding of data since many of the possible sources of 
 variance could not be balanced. In particular, the order in which a sub- 
 ject learned the displays and the order in which the parts of the first 
 test were given could not be varied in such a way as to cancel any possible 
 variance which might be due to learning effects (across displays as well 
 as during the learning of a single display). As far as the learning data 
 is concerned, there is a contradiction between the two parts of the first 
 test which were performed on the different display types. The number of 
 sets necessary to reach the criterion run and the percentage correct dur- 
 ing the criterion run as recorded during test la would seem to indicate 
 that the spectrographic display was easier to learn than the zero-cross- 
 ing display. This same data on test lb, however, tends to indicate the 
 opposite. The most likely explanation appears to be that the differences 
 in both cases are not large enough to be statistically significant given 
 the small amount of data available. 
 
 The confusion matrix data shows several interesting points. In 
 test la, there are very few confusions outside of the three basic word 
 pairs, i.e. zed-said, vile-file, and dame-tame. This could be attributed 
 to the fact that all three displays were able to satisfactorily distinguish 
 vowels. More probably, however, it is due to the fact that the word pairs 
 picked for this test have many differences among themselves and thus there 
 are many cues available with which to distingish them. A much more selec- 
 tive test could have been devised if the word pairs had been more similar 
 in their phonemic structure, e.g. if they all ended in the same phoneme 
 and used the same middle vowel. 
 
 A slight example of the type of results which this improvement 
 might produce is shown in the confusion matrices for test lb for the 
 
8k 
 
 spectrographic and zero-crossing displays. Both subject A and subject B hat 
 a certain amount of trouble distinguishing between the words "ned" and "mob' 
 in the spectrographic display (refer to tables 12 and 13). However, there 
 was no such problem with the zero-crossing display (see tables lb and 17). 
 
 As can be seen from the various confusion matrices, there does 
 appear to be a differential effect in the confusions of some of the test 
 words based on which speaker's recordings were being used, e.g. "zed" in 
 the spectrographic matrices for subjects C, D, and E is mistaken for "said" 
 much more often in the case of speaker c than for any other speaker. It 
 turned out that in the original recording it is in fact rather difficult to 
 determine whether the word is a "zed" or a "said". The problem comes when 
 we note that subjects A and B did not make this confusion. 
 
 This contradiction was eventually resolved by a comment one of 
 the subjects made in regard to another similar situation, namely the word 
 "vile" spoken by speaker b. This case was one in which there was a 
 definite problem of consist ant classification, but the particular display 
 was so strikingly different that the subject simple put it in a class by it- 
 self and after missing it once, he never misclassified it ^gain. This £ 
 effect probably occurred in other cases as well and if common, it would 
 not only obscure the effects of poor pronunciation, but it would also tend 
 to invalidate the tests, since the subjects would be memorizing specific 
 instances instead of general identifying principles. The main way to 
 correct this problem would be to have more speakers and have several 
 different examples from each speaker. Then by having successive test sets 
 composed of different instances, the subjects would never be able to rnemor 
 ize specific instances. 
 
85 
 
 7.3 Data from the Second Test 
 
 The second test was intended to be a closer approximation to the 
 final learning situation since it would involve a comparison between two 
 displays shown simultaneously. Its purpose was to obtain more detailed 
 data on the effectiveness of the displays and on the tolerances which 
 were involved in each type. 
 
 Unfortunately, the Modified Rhyme Test recordings were 
 found to be defective when played through the digital conversion apparatus 
 of the display system. Thus it was necessary to use the same set of 
 recordings as in the first test. But since there were not nearly enough 
 instances in these recordings for a complete test, only a single test 
 involving comparisons between seven words was attempted ("zed", "said", 
 "ned", "ted", "sod", "shod", and "dead"). 
 
 The actual procedures used in the test became rather complex 
 due to some of the external restrictions placed on the experiment. Due 
 to time and cost constraints, only one slide of each display instance was 
 available. Thus an elaborate scheme had to be worked out whereby all 
 possible comparisons of different instances could be performed with a 
 minimum amount of slide shuffling between the two projectors. This was 
 done by dividing the slides into two groups and working out all possible 
 comparisons between the instances in the two groups. In order to keep the 
 expectations of the two possible responses ("same" and "different") 
 equal, it was necessary that approximately half of the matches in each set 
 be the same word. 
 
 The various possible comparisons were written on small index 
 cards and shuffled to give a semi-random ordering. Then the two sets of 
 cards were placed in their respective projectors. One set was arranged so 
 
86 
 
 that the experimenter could project the slides in any arbitrary order. The 
 other set was shuffled and then displayed one at a time by the subject in 
 sequence after the experimenter first noted down their order on a piece of 
 paper. As the subject projected each of his slides in turn, the experimen- 
 ter would pick the corresponding slide from his projector based on the 
 current index card notations being used and project it next to the first 
 slide. The subject would then respond "same" or "different", the experi- 
 menter would answer "right" or "wrong", and then both would go to the next 
 slide pair. When the set was completed, the subject's slides would be 
 shuffled, the experimenter would select a new set of instances to match 
 and the process would repeat. 
 
 Since there were only single copies of each slide it was neces- 
 sary to rearrange the two display sets periodically to match other combin- 
 ations which could not be obtained using the previous set divisions. By 
 having two complete sets of slides this could be eliminated. It would also 
 be possible to have longer test runs between shufflings and to lower the 
 total number of runs necessary. 
 
 Tables 20 through 23 give the data recorded from the second test 
 for the spectrographic and zero-crossing displays, respectively. Only two 
 runs were made using this test and the data is shown in two forms: a de- 
 tailed matrix showing the results of the comparisons of the display 
 instances of particular speakers and a summary matrix showing the propor- 
 tion of "same" responses. In the detailed comparison matrix the speakers 
 are listed along the sides of the matrix for each word in the test. For 
 each instance pair tested a letter will appear at the respective inter- 
 section in the matrix: a "d" if the response was "different", an "s" if 
 the response was "same". If no letter appears, the pair was not tested 
 
87 
 
 ZED 
 
 A 
 
 B 
 
 c 
 
 D 
 
 l.< 
 
 
 
 
 .33 
 
 
 
 
 
 
 
 1.0 
 
 .83 
 
 
 1 
 
 
 
 A 
 
 8 
 C 
 D 
 
 
 ■ 
 
 a 
 
 
 
 
 
 SAID 
 
 • 
 
 
 
 d 
 
 1.0 
 
 
 
 
 d 
 
 
 
 a 
 
 .67 
 
 
 
 
 
 
 ■ 
 
 d 
 
 
 
 L.O 
 
 .50 
 
 
 
 
 A 
 
 B 
 C 
 D 
 
 
 d 
 
 
 
 
 d 
 
 d 
 
 
 
 
 NED 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 L.O 
 
 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 L.O 
 
 
 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 .60 
 
 .67 
 
 
 
 
 A 
 
 B 
 
 C 
 D 
 
 
 d 
 
 a 
 
 
 
 d 
 
 
 
 
 d 
 
 d 
 
 
 
 TED 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 a 
 
 
 
 d 
 
 .8: 
 
 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 1.0 
 
 
 
 
 
 
 d 
 
 ..A. 
 
 
 
 d , 
 
 4 
 
 
 
 .d_ 
 
 d 
 
 
 
 1.9 
 
 ■ft 
 
 
 
 
 A 
 B 
 
 C 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 
 
 
 d 
 
 d 
 
 
 
 
 SHOD 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 1.0 
 
 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 .67 
 
 
 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 d , 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 1.0 
 
 .67 
 
 
 
 
 A 
 B 
 
 C 
 D 
 
 
 d 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 
 
 
 d 
 
 d 
 
 
 
 
 SOD 
 
 a 
 
 
 
 a 
 
 5 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 US. 
 
 
 
 
 
 
 a 
 
 a 
 
 
 
 a 
 
 d 
 
 
 
 d 
 
 (J 
 
 
 
 d 
 
 d 
 
 
 
 _A 
 
 • ft 
 
 
 
 
 
 
 B 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 1.0 
 
 L.O 
 
 
 
 
 A 
 B 
 
 C 
 D 
 
 
 d 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 
 
 
 d 
 
 d 
 
 
 
 
 DEAD 
 
 s 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 8 
 
 
 
 a 
 
 a 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 .67 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 d 
 
 
 
 s 
 
 a 
 
 
 
 d 
 
 d 
 
 
 
 d. 
 
 d 
 
 
 
 .60 
 
 
 
 
 
 A 
 
 B 
 
 ZE 
 
 C 
 
 :d 
 
 D 
 
 A 
 
 B 
 SA 
 
 C 
 ID 
 
 D 
 
 A 
 
 B 
 NE 
 
 C 
 
 :d 
 
 D 
 
 A 
 
 B 
 TE 
 
 C 
 
 :d 
 
 D 
 
 h A 
 
 B 
 SHI 
 
 C 
 DD 
 
 D 
 
 A 
 
 B 
 SC 
 
 C 
 
 >D 
 
 D 
 
 A 
 
 B 
 DE- 
 
 c 
 *D 
 
 D 
 
 Table 20 Detailed Comparison Matrix for Subject A, 
 Test 2, Spectrographs Display 
 
88 
 
 ZEO 
 
 "V 
 
 
 
 
 
 
 
 
 65 H 
 
 
 
 
 
 
 SAID 
 
 .40 
 
 .82 L- j 
 
 
 
 
 
 NED 
 
 ,00 
 
 .00 
 
 \ 
 
 
 
 
 TED 
 
 .33 
 
 .00 
 
 .40 
 
 \ 
 
 
 
 SHOD 
 
 .00 
 
 .00 
 
 .17 
 
 .00 
 
 ^ 
 
 
 SOD 
 
 .17 
 
 .33 
 
 .00 
 
 .00 
 
 .00 
 
 "h_ 
 
 
 
 
 
 
 
 1.0 •— i 
 
 "L, 
 
 
 ZED 
 
 SAID 
 
 NED 
 
 TED 
 
 SHOD 
 
 SOD 
 
 \ 
 
 Table 21 Summary Comparison Matrix for Subject A, 
 Test 2, Spectrographs Display 
 
89 
 
 ZED 
 
 A 
 
 B 
 C 
 D 
 
 33 
 
 
 
 .8C 
 
 
 
 
 
 
 .67 
 
 .67 
 
 
 
 
 A 
 
 B 
 C 
 D 
 
 
 
 
 
 
 
 SAID 
 
 dd 
 
 
 
 ■ 
 
 1.0 
 
 
 
 
 
 
 
 .60 
 
 
 
 
 
 
 
 da 
 
 
 
 i.c 
 
 .83 
 
 
 
 
 A 
 B 
 
 C 
 D 
 
 
 dd 
 
 d 
 
 
 
 
 
 
 
 
 NED 
 
 
 
 
 dd 
 
 dd 
 
 
 
 d 
 
 &_ 
 
 
 
 d 
 
 
 
 
 
 
 
 
 ,60 
 
 
 
 
 
 
 
 
 
 
 dd 
 
 
 
 1.0 
 
 J& 
 
 
 
 
 A 
 
 B 
 C 
 D 
 
 
 
 , | 
 
 
 
 i4. 
 
 d 
 
 
 
 
 
 
 
 
 TED 
 
 dd 
 
 
 
 > 
 
 
 
 
 dd 
 
 •* 
 
 
 
 d 
 
 1.0 
 
 
 
 
 
 
 d 
 
 d 
 
 
 
 
 
 
 
 
 .80 
 
 
 
 
 
 
 d 
 
 
 
 
 
 
 
 
 
 dd 
 
 
 
 1.0 
 
 .83 
 
 
 
 
 A 
 B 
 C 
 
 D 
 
 
 dd 
 
 
 
 
 
 d 
 
 
 
 dd 
 
 d 
 
 
 
 
 
 
 
 
 SHOD 
 
 
 
 
 d 
 
 dd 
 
 
 
 d 
 
 dd 
 
 
 
 
 dd 
 
 
 
 
 l.C 
 
 
 
 d 
 
 
 
 
 
 
 
 d 
 
 B 
 
 
 
 
 
 
 
 
 1.0 
 
 
 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 
 
 
 
 
 
 
 
 dd 
 
 
 
 1.0 
 
 1.0 
 
 
 
 
 A 
 B 
 
 C 
 D 
 
 
 
 d 
 
 
 
 ■d 
 
 
 
 
 
 d 
 
 
 
 dd 
 
 d 
 
 
 
 
 
 
 
 
 SOD 
 
 dd 
 
 
 
 
 
 
 
 d 
 
 dd 
 
 
 
 d 
 
 
 
 
 dd 
 
 dd 
 
 
 
 d 
 
 1.0 
 
 
 
 d 
 
 
 
 
 s 
 
 
 
 
 
 
 
 d 
 
 d 
 
 
 
 
 
 
 
 
 1.0 
 
 
 
 
 
 
 sd 
 
 
 
 
 d 
 
 d 
 
 
 
 d 
 
 
 
 
 
 
 
 
 
 dd 
 
 
 
 ±1° 
 
 LjO 
 
 
 
 
 A 
 
 B 
 C 
 D 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 DEAD 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 A 
 
 B 
 ZE 
 
 C 
 
 :d 
 
 D 
 
 A 
 
 B 
 SA 
 
 C 
 ID 
 
 D 
 
 A 
 
 B 
 NE 
 
 C 
 
 :d 
 
 D 
 
 A 
 
 B 
 TE 
 
 C 
 
 :d 
 
 D 
 
 A 
 
 B 
 SHI 
 
 C 
 DD 
 
 D 
 
 A 
 
 B 
 SC 
 
 C 
 
 )D 
 
 D 
 
 A 
 
 B 
 DE< 
 
 c 
 *D 
 
 D 
 
 Table 22 Detailed Comparison Matrix for Subject A, 
 Test 2, Zero-crossing Display 
 
90 
 
 ZED 
 
 k 
 
 SAID 
 
 .63 
 
 ^ 
 
 
 NED 
 
 .00 
 
 .00 
 
 .8 
 
 3 
 
 
 
 
 TED 
 
 .13 
 
 00 
 
 .13 
 
 .87 1 — | 
 
 
 SHOD 
 
 .00 
 
 .00 
 
 .00 
 
 .00 
 
 • L, 
 
 
 SOD 
 
 .57 
 
 .38 
 
 .00 
 
 .00 
 
 .13 
 
 "L 
 
 .96 ' — 1 
 
 
 DEAD 
 
 .17 
 
 .00 
 
 .33 
 
 .50 
 
 .00 
 
 .00 
 
 .64 1— , 
 
 
 ZED 
 
 SAID 
 
 
 NE 
 
 D 
 
 
 TED 
 
 SHOD 
 
 SOD 
 
 DEAD 
 
 Table 23 Summary Comparison Matrix for Subject A, 
 Test 2, Zero-crossing Display 
 
91 
 
 and if more than one appears it was tested more than once. In the case 
 of pairs of instances representing the same word, the proportion of "same" 
 responses is given, since there were too many cases to write out a letter 
 for each one. 
 
 The biggest problem with this test was the length of time it 
 took to perform it. Subject A worked a total time of about 6 hours on the 
 test for the spectrographic display and was still able to only see approx- 
 imately half of the total number of possible comparisons. A great deal 
 of the time was taken up in the procedural problems mentioned above and a 
 double set of slides would probably cut down the amount of time needed by 
 a significant factor. However, the fact remains that the procedure still 
 will take a great amount of time because of the large number of instance 
 pairs which must be tested. The results from the comparison test show 
 several interesting features. In general, the mistakes appear to be made 
 on the same word pairs in both the zero-crossing and spectrographic dis- 
 play types although the spectrographic display has a higher error rate in 
 almost every case. This would tend to indicate that although the subjects 
 have trouble on the same type of comparisons (at least as far as the words 
 which were tested are concerned) , the zero-crossing display tends to allow 
 the subject to resolve the differences more accurately. (it should be 
 noted in regard to the problem of learning effects that subject A performed 
 the test for the zero-crossing display first). 
 
 The detailed data from the comparison test agrees with the 
 results from the first test in certain respects. In the cases where the 
 same word spoken by two different speakers was presented, the subject tended 
 to make errors on the same instances as in the first test. This effect is 
 most noticeable in the case of "zed" spoken by speaker C. When this instance 
 
92 
 
 was compared to either speaker a or speaker d's "zed" the subject made a 
 high error rate, hut he made a perfect score when speaker b was compared 
 to a or c. 
 
Chapter 8 
 SUMMARY AND CONCLUSIONS 
 
 The discussion of the experimental results can be "broken down 
 into two main areas: a discussion of the tests themselves and a discussion 
 of the general ideas behind the testing. 
 8.1 Comments on the Tests Which Were Performed 
 
 Although the tests which were performed could not be used to 
 establish reliable comparisons between the various display types due to 
 the small number of subjects which were used, they did indicate several 
 points about the procedures to be used. 
 
 In picking out the words to be used in the test, an attempt was 
 made to select a variety of words which would contain all of the common 
 phonemes (at least in the English language). In order to minimize the 
 total number of words, however, the selection was restricted in such a way 
 that most of the words in the list differed from one another in several 
 ways. It was originally felt that the effects of single phonemes could be 
 determined from a multi-variant analysis of data from the complete set of 
 words. Unfortunately, the amount of motivation and work necessary to 
 adequately perform on a test with ^0 or 50 different word displays to 
 remember is much more than the average subject will ever have. When 
 smaller subsets of the word list are used, it is not possible to control 
 all of the variance. 
 
 Thus one basic change which should probably be made in the word 
 lists is to use nonsense consonant-vowel-consonant syllables and to pick 
 these syllables in such a way as to have subsets in which each "word" 
 differs in only one phoneme. In order to keep the total number of words 
 in the data set at a minimum, it will be necessary to restrict the number 
 
 93 
 
9h 
 
 of vowels used in these subsets. The vowels could then be tested using 
 a subset in which only they vary. 
 
 The recordings themselves should be made by trained speakers. 
 In the current tests this was not done and the result was that it was 
 sometimes difficult to tell whether the subject confused two different 
 examples of a particular word because the display was ineffective or 
 because the audio recordings of the words were not differentiable. 
 
 In addition many more recordings of each word should be used. 
 In the current tests, the subjects viewed all the instances during each 
 trial set and from the comments which were made, it was apparent that 
 they were memorizing specific instances. In order to avoid this, it 
 would be necessary to have enough instances so that several test sets 
 could be run without repeating any instances. 
 8.2 Comments on the General Method 
 
 Above and beyond any purely technical problems with the tests 
 themselves there are some more general problems involving the whole idea 
 behind the tests. As was mentioned in Section h, there is the problem 
 of whether or not the subject can use a display to correct his pronuncia- 
 tion even if he is able to detect that there is a difference between his 
 pronunciation and the standard. This can only be answered when a real- 
 time display system is developed and tests can be conducted on-line with 
 the system. 
 
 There are other problems as well. For one thing, the word 
 identification type of testing which was used in this experiment is not 
 exactly the same type of situation as will be needed in the final use 
 of the system. It might very well be that the speech deformations 
 encountered in training the deaf or teaching the pronunciation of a 
 
95 
 
 foreign language are qualitatively different from the differences between 
 the pronunciation of different words in a single language. In such a case, 
 the present type of testing may be inappropriate insofar as determining 
 the suitability of the various display types. This question can be solved 
 by using the appropriate types of recordings and seeing if the results of the 
 tests change in any way. 
 
 One other objection to this technique is the difficulty of apply- 
 ing it to the specialized displays which are often used in speech correc- 
 tion, such as pitch indicators, nasality indicators, etc. In principle 
 these types of indicators could probably be tested using the present 
 techniques and the displays could probably be generated quite easily by 
 the system. However, in the case of this type of display, a much simpler 
 testing method could probably be devised. 
 8.3 Summary 
 
 The purpose of the present project was to develop a computer speech 
 display simulation system capable of generating a wide variety of speech 
 displays from a recorded speech input. Eventually it is hoped that this 
 will lead to a system whereby a person can obtain visual feedback as a 
 corrective measure for word pronunciation'. The basic system would involve 
 two displays, one representing the subject's pronunciation of a particular 
 word and the other representing a correct pronunciation of the word. A 
 computer would be used to process the Incoming speech and produce a display 
 containing features highly relevant to correct pronunciation. The sub- 
 ject's task would be to detect differences in the two displays and to 
 change his pronunciation so as to make them more similar. 
 
 After conducting an extensive literature search to determine the 
 types of schemes which had previously been used to display speech sounds, 
 
96 
 
 a basic interactive display system was programmed using the CSL's CDC 
 160^ computer-graphics facility. The system has been designed to be open- 
 ended and currently can produce photographs of a variety of display types. 
 Unfortunately, the system as it stands now cannot operate in real time due 
 to the slowness of the CDC 160U. 
 
 The simulation system was used to produce examples of several 
 different types of displays. These displays were used in a series of 
 preliminary tests designed to develop techniques for comparing the effec- 
 tiveness of various types of displays. Several corrections and refinements 
 to the testing methods are discussed. 
 
REFERENCES 
 
 Abramson, Normajo, "Visual Aids for the Speech Correction of the Deaf 
 and Hard-of-Hearing" , M.A. Thesis, Emerson College, 1952. 
 
 Anderson, F. , "An Experimental Pitch Indicator for Training Deaf Scholars", 
 J. Acoust. Soc. Am, , Vol. 32, No. 8, August i960, pp. IO65-IO7U. 
 
 Barton, G. W. Jr., and Barton, S. H. , "Forms of Sounds as Shown on an 
 Oscilloscope "by Roulette Figures", Science , Vol. l*+2, 1963, pp. 1^55-1^56. 
 
 Bennett, W. R. , "The Correlatograph - A Machine for Continuous Display 
 of Short-Term Correlation", Bell System Journal, Vol. 32, 1953, 
 pp. 1173-1185. 
 
 Beyer, M. R. , Webster, J. C. , and Dague, D. M. , "Revalidation of the 
 Clinical Test Version of the Modified Rhyme Words" , J. Speech and Hearing 
 Research , Vol. 12, I969, pp. 37^-378. 
 
 Biddulph, R. , "Short Term Auto-Correllation Analysis and Correlatograms 
 of Spoken Digits", J. Acoust. Soc. Am. , Vol. 26, No. h 3 July 195^, 
 PP. 539-5^1. "" " " 
 
 Bitzer, D. L. , and Slottow, H. G. , "The Plasma Display Panel - A Digitally 
 Addressable Display with Inherent Memory", Fall Joint Computer Conference 
 Proceedings-1966 , Vol. 29, Sparten, Washington, D. C, 1966, pp. 5 1 +l-5 1 +7. 
 
 Bobrow, D. G. , and Klatt , D. H. , "A Limited Speech Recognition System", 
 Fall Joint Computer Conference Proceedings-1968 , Vol. 33, Sparten, 
 Washington, D. C. , 1968, pp. 305-318. 
 
 Bower, G. H. and Trabasso, T. R. , "Concept Identification", in: Studies 
 in Mathematical Psychology , Chap. 2, R. C. Atkinson (ed.), Stanford Univer- 
 sity Press, Stanford, California, 1965, pp. 32-9^. 
 
 Bridges, C. C. , "An Apparatus for the Visual Display of Speech Sounds", 
 Am. J. of Psychology , Vol. 77, No. 2, June I96U, pp. 301-303. 
 
 Brigham, E. 0., and Morrow, R. E. , "The Fast Fourier Transform", IEEE 
 Spectrum , Vol. k, No. 12, December 1967 , pp. 63-70. 
 
 Bruner, J. S. , Goodnow, J. J. and Austin, G. A., A Study of Thinking , 
 Science Editions, Inc., John Wiley and Sons, New York, 1962. 
 
 Campanella, S. J., Coulter, D. C. , and Speaker, D. M. , "Formant Tracking 
 Speech Band-width Compression System Improvements", Melpar Inc., Tech. 
 Report AFAL-TR-65-5 , AD-1+61 ^90, March 1965. 
 
 Cavanagh, Anita, "A New Audio-Visual Aid for Speech", The Volt a Review , 
 Vol. 53, No. 1, January 1951, pp. 12-13, UO-Ul. 
 
! 
 
 Chang, S., Pihl, G. E. , Essigman, M. W. , "Representations of Speech 
 Sounds and Some of Their Statistical Properties", Proceedings of the IRE . 
 Vol. 39, No. 2, February 1951a, pp. lVf-153. 
 
 Chang, S. H. , Pihl, G. E. , and Wiren, J., "The Intervalgram as a Visual 
 Representation of Speech Sounds", J. Acoust. Soc. Am. , Vol. 23, No. 6, 
 November 1951b, pp. 675-679. 
 
 Cochran, William T. , Cooley, J. ¥. , Favin, D. L. , Helms, H. D. , Kaenel, 
 R. A., Lang, W. W. , Maling, G. C. , Jr. Nelson, D. E. , Rader, C. M. , and 
 Welch, P. D. , "What is the Fast Fourier Transform?", IEEE Trans, on Audic 
 and Electroacoustics, Vol. AU-15 , No. 2, June 1967, pp U5-55. 
 
 Cohen, Martin L. , "The ADL Sustained Phoneme Analyzer", Am. Annals of 
 the Deaf , Vol. 113, 1968, pp. 21+7-252. 
 
 Conner, J. Edward, "Evaluation of the Voice Visualizer as an Aid in 
 Teaching Voice Improvement", D. Ed. Thesis, Boston University, 1955. 
 
 Cooley, James W. , and Tukey, John W. , "An Algorithm for the Machine 
 Calculation of Complex Fourier Series", Mathematics of Computation, Vol. ■ 
 No. 90, April 1965, pp. 297-301. - - . 
 
 Coyne, A. E. , "The Coyne Voice Pitch Indicator", Teacher of the Deaf , 
 Vol. 36, 1938a, pp. 3-U, 100-103. 
 
 Coyne, A. E. "More About the Voice Pitch Indicator", The Volt a Review , 
 Vol. Uo, No. 10, October 1938b, pp. 5U9-552, 598-599. 
 
 Davis, K. H. , Biddulph, R. , and Balashek, S., "Automatic Recognition of 
 Spoken Digits", J. Acoust. Soc. Am. , Vol. 2k, No. 6, November 1952, 
 pp. 637-6142. 
 
 Delattre, P. C. , Liberman, A. M. , and Cooper, F. S., "Acoustic Loci and 
 Transitional cues for Consonants", J. Acoust. Soc. Am. , Vol. 27, 1955, 
 pp. 769-773. 
 
 Dolansky, L. 0., "An Instantaneous Pitch Period Indicator", J. Acoust. 
 Soc. Am. , Vol. 27, No. 1, January 1955, pp. 67-72. 
 
 Dolansky, L. , Ferullo, R. J., O'Donnell, M. C, and Phillips, N. D. , 
 "Teaching Intonation and Inflections to the Deaf", Northeastern 
 University, Cooperative Res. Proj . No. S-28l, 1965. 
 
 Dolansky, L. , and Phillips, N. D. , "Teaching Vocal Pitch Patterns Using 
 Visual Feedback From the Instantaneous Pitch-Period Indicator for Self- 
 monitoring", Northeastern University, VRA Proj. No. 1907-S , October 1966. 
 
 Dreyfus-Graf, J. , "Sur les Spectres Transitores d' elements Phonetiques 
 Helvetica Physica Acta, Vol. 19, 19^6, pp. 1+014-1+08. 
 
 Dreyfus-Graf ,J. Schweig, "The Sonograph: Elementary Principles", Arch . 
 Angen. Wiss. Tech. , (in French), Vol. ik, December 19^8, pp. 353-362. 
 
99 
 
 Dreyfus-Graf, J., "Le Steno-Sonographe Phonetique", Technishe Mitteilungen 
 PTT, Vol. 28, No. 3, 1950, pp. 89-95. 
 
 Dreyful-Graf, J., "Sonograph and Sound Mechanics", J. Acoust. Soc. A m. 
 Vol. 22, No. 6, November 1950, pp. 731-739. ' 
 
 Dudley, H. , and Gruenz, 0. 0. Jr., "Visible Speech Translators with 
 External Phosphors", J. Acoust. Soc. Am. , Vol. 18, No. 1, July 19^6, 
 pp. 62-73. 
 
 Ewing, G. D. , and Taylor, John F. , "Computer Recognition of Speech 
 
 Using Zero-Crossing Information", IEEE Trans, on Audio and Electroacoustics , 
 
 Vol. AU-17, No. 1, March 1969, pp. 37-^0. 
 
 Fabian, Fredrick E. , "Evaluation of the Voice Visualizer as a Supplementary 
 Aid in the Correction of Articulation Disorders", E. Ed. Thesis, Boston 
 University, 1955. 
 
 Flowers, J. B. , "The True Nature of Speech - With Application to a Voice- 
 Operated Phonographic Alphabet Writing Machine", Trans. Am. Inst, of 
 Elect. Engin. , Vol. 35, Pt. 1, 19l6, pp. 213-2U8. 
 
 Focht, L. R. and Piotrowski , C. F. , "Voice Sound Recognition", Philco 
 Corp„, Tech. Report No. RADC-TR-66-507, AD-802 997, October 1966. 
 
 Foulkes, J. D. , "Computer Identification of Vowel Types", J. Acoust. 
 Soc. Am. , Vol. 33, No. 1, January 196l, pp. 7-11. 
 
 Fry, D. B. , and Denes, P., "The Solution of Some Fundamental Problems 
 in Mechanical Speech Recognition", Language and Speech , Vol. 1, Pt. 1, 
 January - March 1958, pp. 35-58. 
 
 Gentleman, W.M. , and Sande, G. , "Fast Fourier Transforms - For Fun and 
 Profit", Fall Joint Computer Conference Proceedings - 1966 , Vol. 29, 
 Sparten, Washington, D. C. , 1966, pp. 563-578. 
 
 Gruenz, 0. 0. Jr., and Schott, L. 0., "Extraction and Portrayal of Pitch 
 of Speech Sounds", J. Acoust. Soc. of Am. , Vol. 21, No. 5, September 19^9, 
 pp. ^87-^95. 
 
 Halle, M. , Hughes, G. W. , and Radley, J. P. A., "Acoustic Properties of 
 Stop Consonants", J. Acoust. Soc. Am. , Vol. 29, No. 1, January 1957, 
 pp. 107-116. 
 
 Harris, C. M. ,and Waite, W. M. , "Display of Sound Spectrographs in Real 
 Time", J. Acous. Soc. Am . , Vol. 35, No. 5, May 1963, p. 729. 
 
 Harris, K. S. , Hoffman, H. S. , Liberman, A. M. , Delattre, P. C. , and 
 Cooper, F. S., "Effect of Third Formant Transitions in the Perception of 
 the Stop and Nasal Consonants", J. Acoust. Soc. Am. , Vol. 30, 1958, 
 PP. 122-126. 
 
100 
 
 Heygood, R. C. and Bourne, L. E. , "Attribute and Rule Learning Aspects 
 
 of Conceptual Behavior", Psychological Reviews , Vol. 72, 1965, pp. 175-196. 
 
 House, A. S., Goldstein, D. P., and Hughes, G. W. , "Perception of Visual 
 Transforms of Speech Stimuli: Learning Simple Syllables", Am. Annals 
 of the Deaf , Vol. 113, 1968, pp. 215-221. ~~ 
 
 Huggins, W. H., "A Note on Autocorrelation Analysis of Speech Sounds", 
 J. Acoust. Soc. Am. , Vol. 26, No. 5, September 195*1, pp. 790-792. 
 
 Hughes, G. W. , "The Recognition of Speech by Machine", Research 
 Laboratory of Electronics, Mass. Inst, of Tech. ,Tech. Report 395, 
 AD-268 H89, May 1961. 
 
 Hughes, G. W. , and Hemdal, J. F. , "Speech Analysis", Purdue Research 
 Foundation, Lafayette, Indiana, AF Project 5628, Final Report, TR-EE65-9, 
 AFCRL-65-68, AD 62k 555, July 1965. 
 
 Jakobson, R. , Fant, C. G. M. , and Halle, M. , "Preliminaries to Speech 
 Analysis", Acoust. Lab., Mass. Inst, of Tech., Tech. Report No. 13, 1952. 
 
 Jakobson, R. , and Halle, M. , Fundamentals of Langugage, Mouton and Co., 
 Gravenhage, Netherlands, 1956. 
 
 Johnson, J. B. , "A Cathode-Ray Tube for Viewing Continuous Patterns", 
 J. of Applied Physics, Vol. 17, No. 11, November 19^6, pp. 891-89^. 
 
 Kersta, L. G. , "Amplitude Cross-Section Representation with the Sound 
 Spectrograph", J. Acoust. Soc. of Am. , Vol. 20, No. 6, November 19^8, 
 pp. 796-801. 
 
 Kock, W. E. , and Miller, R. L. , "Dynamic Spectrograms of Speech", J. 
 Acoust. Soc. Am. , Vol. 2k, No. 6, November 1952, pp. 783-78U. 
 
 Koenig, W. , Dunn, H. K. , and Lacy, L. Y. , "The Sound Spectrograph", 
 J. Acoust. Soc. Am. , Vol. 18, No. 1, July 19^6, pp. 19-^9. 
 
 Koenig, W. , and Ruppel, A. E. , "Quantitative Amplitude Representation 
 
 in Sound Spectrograms", J. Acoust. Soc. Am. , Vol. 20, No. 6, November 19^8, 
 
 pp. 785-795. 
 
 Kopp, G. A., and Green, H. C. , "Basic Phonetic Principles of Visible 
 Speech", J. Acoust. Soc. Am. , Vol. 18, No. 1, July 19^6, pp. 7^-89. 
 
 Kopp, G. A., and Kopp, H. G. , "Visible Speech for the Deaf", Speech and 
 Hearing Clinic, Wayne State University, Final Report, Office of Vocational 
 Rehabilitation, Dept. of HEW, 1963a. 
 
 Kopp, G. A. , and Kopp, H. C. , "An Investigation to Evaluate Usefulness of 
 the Visible Speech Cathode Ray Tube Translator as a Supplement to the 
 Oral Method of Teaching Speech to Deaf and Severely-deafened Children" , 
 Wayne State University, Final Report, Grant No. RD-526, Office of Vocational 
 Rehabilitation, Dept HEW s 1963b. 
 
101 
 
 Kreul, E. J., Nixon, J. C. , Kryter, K. D. , Bell, D. W. , and Lang, J. S. , 
 "A Proposed Clinical Test of Speech Discrimination", J. Speech and Hearing 
 Research , Vol. 11, No. 3, September 1968, pp. 536-552. 
 
 Ladefoged, P., and Broadbent , D. E., "information Conveyed by Vowels", 
 J. Acoust. Soc. Am., Vol. 29, No. 1, January 1957, pp. 98-10*+. 
 
 Lenneberg, E. H. , "Biological Foundations of Language", John Wiley and 
 Sons, Inc., New York, 1967. 
 
 Lerner, Robert M. , Research Laboratory of Electronics, Mass. Inst, of 
 Tech., Quarterly Progress Report, January 15, 1952, p. 55. 
 
 Lerner, Robert M. , "A Method of Speech Compression", ScD. Thesis, 
 Dept. E. E., Mass. Inst, of Tech., 1959. 
 
 Liberman, A. M. , "Some Results of Research on Speech Perception", J. 
 Acoust. Soc. of Am., Vol. 29, No. 1, January 1957, pp. 117-123. 
 
 Liberman, A. M. , Cooper, F. S., Shankweiler, D. P., and Studdert -Kennedy, 
 M. , "Perception of the Speech Code", Psychological Review , Vol. 7*+, No. 6, 
 November 1967a, pp. 431-U61. 
 
 Liberman, A. M. , Cooper, F. S., Studdert -Kennedy, M. , "Why Are Spectrograms 
 Hard to Read?", Haskins Laboratories, Quarterly Progress Report, April- 
 June 1967b, pp. 1.1-1.12. 
 
 Liberman, A. M. , Delattre, P. C, , and Cooper, F. S. , "Some Cues for the 
 Distinction Between Voiced and Voiceless Stops in Initial Position", 
 Language and Speech, Vol. 1, Pt. 3, July^Sept ember 1958, pp. 153-167. 
 
 Liberman, A. M. , Delattre, P. C, Cooper, F. S. , and Gerstman, L. J., 
 "The Role of Consonant Vowel Transitions in the Perception of the Stop 
 and Nasal Consonants", Psychological Monographs , Vol. 68, No. 8, Whole No. 
 379, 195U. 
 
 Liberman, A. M. , Ingemann, F. , Lisker, L. , Delattre, P., and Cooper, F. S. , 
 "Minimal Rules for Synthesizing Speech", J. Acoust. Soc. Am. , Vol. 31, 
 No. 11, November 1959, pp. 1^0-1^99 . 
 
 Licklider, J. C. R., "The Intelligibility of Amplitude-Dichotomized, Time- 
 Quantized Speech Waves", J. Acoust. Soc. Am., Vol. 22, No. 6, November 
 1950, pp. 820-823. 
 
 Licklider, J. C. R. , and Pollack, I., "Effects of Differentiation, 
 Integrations and Infinite Peak Clipping Upon the Intelligibility of 
 Speech", J. Acoust. Soc. Am., Vol. 20, No. 1, January 19^8, pp. i+2-51. 
 
 Majewski, W. , and Hollien, H. , "Formant Frequency Regions of Polish 
 Vowels", J. Acoust. Soc. Am., Vol. 1+2, No. 5, November 1967, pp. 1031-1037. 
 
102 
 
 Martony, Janos , "On the Correction of the Voice Pitch Level for Severely 
 Hard-of-Hearing Subjects", Am. Annals of t he Deaf, Vol. 113 1968 
 pp. 195-202. " 
 
 Mathes, R. C. , Norwine , A. C. , and Davis, K. H. , "The Cathode-Ray Sound 
 Spectroscope", J. Acoust. Soc. Am., Vol. 21, No. 5, September lQho 
 PP. 527-537. 
 
 Noll, A. M. , "Short-Time Spectrum and 'Cepstrum' Techniques for Vocal- 
 Pitch Detection", J. Acoust. Soc. Am., Vol. 36, No. 2, February 1964 
 pp. 296-302. — 
 
 Noll, A. M. , "Cepstrum Pitch Determination", J. Acoust. Soc. Am. , Vol. 4l, 
 No. 2, February 1967, pp. 293-309. 
 
 Nordmann, Bernard J. Jr., "Speech Display Simulation System for a Compara- 
 tive Study of Some Visual Speech Displays", Digital Computer Laboratory, 
 University of Illinois, Tech. Report 470, August 1971; also as Coordinated 
 Science Laboratory, University of Illinois, Tech. Report R-524 , September 
 1971. 
 
 Oppenheim, Alan V., "Speech Spectrograms Using the Fast Fourier Transform". 
 IEEE Spectrum, Vol. 7, No. 8, August 1970, pp. 57-62. 
 
 Otten, K. W. , "Simulation and Evaluation of Phonetic Speech Recognition 
 Techniques - Acoustical Characteristics of Speech Sounds Systematically 
 Arranged in the Form of Tables" , National Cash Register Co. , RTD-TDR-63- 
 4005, Vol. Ill, AD-601 1*22, March 1964a. 
 
 Otten, K. W„, "Simulation and Evaluation of Phonetic Speech Recognition 
 Techniques - Indexed Bibliography on Speech Analysis, Synthesis, and 
 Processing", National Cash Register Co., RTD-TDR-63-4005 , Vol. IV, 
 AD-601 1+21, April 1964b. 
 
 Otten, K. W. , "Simulation and Evaluation of Phonetic Speech Recognition 
 Techniques - Summary Report", National Cash Register Co., RTD-TDR-63-4005, 
 Vol. V, AD-602 691, April 1964c. 
 
 Peterson, G. E. , "Design of Visible Speech Devices", J. Acoust. Soc. Am ., 
 Vol. 26, No. 3, May 1954, pp. 4o6-4l3. 
 
 n 
 
 Phillips, Nathan D., Remillard, Wilfred, Bass, Susan, and Pronovost, 
 Wilbert, "Teaching of Intonation to the Deaf by Visual Pattern Matching", 
 Am. Annals of the Deaf , Vol. 113, 1968, pp. 239-246. 
 
 Pickett, J. M. , ed. , "Proceedings of the Conference on Speech-Analyzing 
 Aids for the Deaf", Am. Annals of the Deaf, Vol. 113, 1968, pp. 117-326. 
 
 Pickett, James M. , "Some Applications of Speech Analysis to Communication 
 Aid for the Deaf", IEEE Trans, on Audio and Elect oacoustics , Vol. AU-17, 
 No. 4, Dec. 1969, pp. 283-289. 
 
103 
 
 Pickett, J. M. , and Constam, Alfred, "A Visual Speech Trainer with Simplified 
 Indication of Vowel Spectrum", Am. Annals of the Deaf , Vol. 113, pp. 253-258. 
 
 Plant, G.R.G., "The Plant-Mandy Voice Trainer - Some Notes by the Designer", 
 Teacher of the Deaf , Vol. 58, I960, pp. 12-15. 
 
 Plomp, R. , Pols, C. W. , Van de Geer, J. P., "Dimensional Analysis of Vowel 
 Spectra", J. Acoust. Soc. Am., Vol. 1+1, No. 3, 1967, pp. T0T-T12. 
 
 Potter, R. K., "Visible Patterns of Sound", Science, Vol. 102, No. 265I+, 
 November 9, 19^5, pp. 1+63-VfO. 
 
 Potter, R. K. , "introduction to Technical Discussions of Sound Portrayal", 
 J. Acoust. Soc. Am., Vol. 18, No. 1, July I9I+6, pp. 1-3. 
 
 Potter, R. K. , Kopp, G. A., and Green, H. C. , Visible Speech , D. Van 
 Nostrand Co. Inc., New York, 19 1 +7. 
 
 Presti, A. J., "High Speed Sound Spectrograph", J. AcOust. Soc. Am. , 
 Vol. 1+0, No. 3, September 1966, pp. 628-63I+. 
 
 Prestigiacomo, A. J., "Plastic Tape Sound Spectrograph", J. of Speech 
 and Hearing Disorders , Vol. 22, No. 3, September 1957, pp. 321-327. 
 
 Prestigiacomo, A. J., "Amplitude Contour Display of Sound Spectrograms", 
 J. Acoust. Soc. Am. , Vol. 3.1+ , No. 11, November 1962, pp. 168H-1688. 
 
 Pronovost, Wilbert , "Visual Aids to Speech Improvement", J. of Speech 
 Disorders , Vol. 12, No. k 9 December 19^7, pp. 387-391. 
 
 Pronovost, W. , "A Pilot Study of the Voice Visualizer for Teaching 
 
 Speech to the Deaf", Proceedings of the International Congress on Education 
 
 of the Deaf 1963 , U. S. Government Printing Office, Senate Document No. 196, 
 
 1961+. 
 
 Pronovost, Wilbert, Yenkin, Linda, Anderson, D. C. , and Lerner, R. , "The 
 Voice Visualizer", Am. Annals of the Deaf, Vol. 113, 1968, pp. 230-238. 
 
 Pyron, B. 0., and Williamson, F. R. , Jr., "Visual Display of Speech by 
 Means of Oscillographic Roulette Figures", Science , Vol. ll+5, 1961+ , 
 PP. 72-73. 
 
 Pryon, B. 0., and Williamson, F. R. , Jr., "Study and Analysis of Signal 
 Display and Bandwidth Compression Techniques" , Georgia Institute of Tech. , 
 Final Report project A-791, Contract DA 1+9-092-AR0-52 , AD 6l6 6kh, 
 June 1965. 
 
 Radley, J. P., "The Role of Formant Transitions in the Identification of 
 English Stops", MS Thesis, Mass. Inst, of Tech., 1956. 
 
 Ramaswamy, T. K. , and Ramakrishna, B. S. , "Simple Laboratory Setup for 
 Obtaining Sound Spectrograms" , J. Acoust. Soc. Am., Vol. 3I+, No. 1+, 
 April 1962, pp. 515-517. 
 
101+ 
 
 Reddy, D. R. , "An Approach to Computer Speech Recognition by Direct 
 Analysis of the Speech Wave", Ph.D. Thesis, Computer Science Dept., 
 Stanford University, Tech. Report CSU9, AD-6^0 836, September 1966. 
 
 Restle, F. , "The Selection of Strategies in Cue Learning", Psychologi- 
 cal Reviews , Vol. 69, No. k t July 1962, pp. 329-3^3. 
 
 Riesz, R. R. , and Schott, L. , "Visible Speech Cathode-Ray Translator", 
 J. Acoust. Soc. Am., Vol. 18, No. 1, July I9U6, pp. 50-61. 
 
 Risberg, Arne , "Visual Aids for Speech Correction", Am. Annals of the 
 Deaf , Vol. 113, 1968, pp. 178-19 U. '" " '" 
 
 Risberg, A., "A Critical Review of Work on Speech Analyzing Hearing Aids", 
 IEEE Trans, on Audio and Electoacoustics , Vol. AU-17, No. h, December 1969. 
 
 Sakai, T. and Doshita, S. , "The Automatic Speech Recognition System for 
 Conversational Sound", IEEE Trans, on Electronic Computers , Vol. EC-12, 
 No. 6, December 1963, pp. 835-8U6. 
 
 Sakai, T. , Doshita, S. , Niimi, Y. , and Tabata, K. , "Fundamental Studies 
 of Speech Analysis and Synthesis", Am. Annals of the Deaf , Vol. 113, 
 1968, pp. 156-167. 
 
 Sakai, T. , and Inoue, S., "New Instruments and Methods for Speech 
 Analysis", J. Acoust. Soc. Am., Vol. 32, No. k, April i960, pp. kkl-k^Q. 
 
 Stark, Rachel E. , Cullen, John K. , and Chase, Richard A., "Preliminary 
 Work with the New Bell Telephone Visible Speech Translator" , Am. Annals 
 of the Deaf, Vol. 113, 1968, pp. 205-21**. 
 
 Steinberg, J. C. , and French, N. R. , "The Portrayal of Visible Speech", 
 J. Acoust. Soc. Am., Vol. 18 , No. 1, July 19^6, pp. I4-I8. 
 
 Stevens, K. N. , "Autocorrelation Analysis of Speech Sounds", J. Acoust. 
 Soc. of Am. , V ol. 22, No. 6, November 1950, pp. 769-771. 
 
 Teacher, C. F. , Kellett, H. G. , and Focht , L. R. , "Experimental Limited 
 Vocabulary, Speech Recognizer", IEEE Trans, on Audio and Electroacoustics , 
 Vol. AU-15, No. 3, September 1967, pp. 127-130. 
 
 Teacher, C. F. , and Piotrowski, C. F. , "Voice Sound Recognition", 
 
 Philco, Corporation, Tech. Report RADC-TR-65-l8>+, AD-619 9^k , July 1965. 
 
 Thomas, I. B. , "The Significance of the Second Formant in Speech 
 Intelligibility", Biological Computer Laboratory, Dept. of E. E.,Univ. of 
 Illinois, Tech. Report No. 10, July 1966. 
 
 Timberlake, Josephine B., "The Coyne Voice Pitch Indicator", The Volt a 
 Review, Vol. Uo, No. 8, August 1938, pp. U37-U39, ^68-U69. 
 
105 
 
 Upton, Hubert W. , "Wearable Eyeglass Speechreading Aid", Am. Annals of 
 the Deaf, Vol. 113, 1968, pp. 222-229. 
 
 Vilbig, F. , "An Apparatus for Speech Compression and Expansion and for 
 Replaying Visible Speech Records", J. Acoust. Soc. Am. , Vol. 22, No. 6, 
 November 1950, pp. 75^-76l. 
 
 Vilbig, F. , "Visible Speech-Rotary Field Coordinate-Conversion Analyser", 
 IRE Trans. Audio, Vol. AU-2, No. 2, March-April 195^, pp. 76-80. 
 
 Williams, D. E. ,"A Visual Display of Certain Speech Parameters" , MS Thesis, 
 U. S. Naval Postgraduate School, AD-820 518, July 1967. 
 
 Willson, R. H. , "A Compacitively Coupled Bistable Gas Discharge Cell for 
 Computer Controlled Displays", Coordinated Science Laboratory, Univ. of 
 Illinois, CSL Report R-303, June 1966. 
 
 Wood, D. E. , and Hewitt, T. L. , "New Instrumentation for Making Spectro- 
 graphs Pictures of Speech", J. Acoust. Soc. Am. , Vol. 35, No. 8, 
 August 1963, pp. 127^-1278. 
 
 Wood, D. E. , "New Display Format and a Flexible-Time Integrator for 
 Spectral-Analysis Instrumentation", J. Acoust. Soc. Am. , Vol. 36, No. k, 
 April 196U, pp. 639-6^3. 
 
io6 
 
 VITA 
 
 Bernard Joseph Nordmann Jr. was "born in the little town of Lawt . 
 Oklahoma on October 28, 19^+3, the eldest of five children born to Rosita 
 and Bernard Nordmann Sr. After a boyhood spent in wandering the nations 
 of the earth, he spent five strenuous years in the Boston metropolitan 
 area satisfying the requirements for the S.B. and S.M. degrees (E.E.) fro 
 the Massachusetts Institute of Technology which he received in 19 66. Sea.: 
 ing for a change of scenery, he next journeyed to the fabled midwest wher,' 
 he settled in the mystical land of Champaign, Illinois. Here he worked ft 
 the Department of Computer Science on the Illiac III computer project whi: 
 working towards his Ph.D. degree which he completed in 1971. 
 
 While working on the Illiac III project, Mr. Nordmann was res- j 
 ponsible for the design of the main central processors used in the system 
 He also worked for the U.S. Naval Ordnance Laboratory in fits and starts 
 between the years 1963 and 1966. 
 
 Mr. Nordmann is a member of the Institute of Electrical and Ele ■ 
 tronic Engineers, The Association for Computing Machinery and Sigma Xi. 
 
oiAEC-427 
 (6/68) 
 ; CM 3201 
 
 U.S. ATOMIC ENERGY COMMISSION 
 
 UNIVERSITY-TYPE CONTRACTOR'S RECOMMENDATION FOR 
 
 DISPOSITION OF SCIENTIFIC AND TECHNICAL DOCUMENT 
 
 ( See Instructions on Reverse Side ) 
 
 ,EC REPORT NO. 
 
 ;00-21l8-002l+ 
 
 JIUCDCS-R-71-U79 
 
 2. TITLE 
 
 A COMPARATIVE STUDY OF SOME VISUAL SPEECH DISPLAYS 
 
 5. YPE OF DOCUMENT (Check one): 
 
 [33 a. Scientific and technical report 
 
 Q b. Conference paper not to be published in a journal: 
 
 Title of conference 
 
 Date of conference 
 
 Exact location of conference. 
 
 Sponsoring organization 
 
 □ c. Other (Specify) 
 
 ». 1ECOMMENDED ANNOUNCEMENT AND DISTRIBUTION (Check one): 
 
 /Q a - AEC's normal announcement and distribution procedures may be followed. 
 
 ! L] b. Make available only within AEC and to AEC contractors and other U.S. Government agencies and their contractors. 
 Q] c. Make no announcement or distrubution. 
 
 i. EASON FOR RECOMMENDED RESTRICTIONS: 
 
 UBMITTED BY: NAME AND POSITION (Please print or type) 
 
 Bernard J. Nordmann Jr. 
 Research Assistant 
 
 irganization 
 
 Department of Computer Science 
 University of Illinois, Urbana, Illinois 
 
 61801 
 
 ignatura 
 
 Date 
 
 9/17/71 
 
 FOR AEC USE ONLY 
 
 EC CONTRACT ADMINISTRATOR'S COMMENTS, IF ANY, ON ABOVE ANNOUNCEMENT AND DISTRIBUTION 
 ECOMMENDATION: 
 
 VTENT CLEARANCE: 
 
 LJ a. AEC patent clearance has been granted by responsible AEC patent group. 
 [_J b. Report has been sent to responsible AEC patent group for clearance. 
 LJ c. Patent clearance not required. 
 
* 
 
 .<?