'% 
 
The person charging this material is tc- 
 soonsible for its return to the library from 
 wTch it was withdrawn on or before the 
 Latest Date stamped below. 
 The „, ■—. - — ^i— iTC 
 
 for disciplinary action and may resun 
 trentJcalr'Teiephone Cen.er, 333-840O 
 
 L161— O-1096 
 
lit P UIUCDCS-R-77-870 
 
 Irro 
 
 /lack. 
 
 UILU-ENG 77 17^5 
 
 APPLICATION OF BURST PROCESSING TO THE SPECTRAL 
 DECOMPOSITION OF SPEECH 
 
 by 
 
 July 1977 
 
 CHRIST JOHN XYDES 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 
 
 URBANA, ILLINOIS 
 
 The Library of ! 
 
 SEP 3 1977 
 
 
APPLICATION OF BURST PROCESSING TO THE SPECTRAL 
 DECOMPOSITION OF SPEECH 
 
 BY 
 CHRIST JOHN XYDES 
 B.S., University of Illinois 1975 
 
 THESIS 
 
 Submitted in partial fulfillment of the requirements 
 for the degree of Master of Science in Computer Science 
 in the Graduate College of the 
 University of Illinois at Urbana-Champaign, 1977 
 
 Urbana, Illinois 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/applicationofbur870xyde 
 
"Ill 
 
 ACKNOWLEDGEMENT 
 
 The author wishes to thank his advisor, Professor W. J. Poppelbaum, 
 for year of friendship and professional guidance. It has been a great 
 pleasure to have been associated with him and his group. He would also 
 like to thank Professor Jane Liu for her friendship and advice. Special 
 thanks are due to the members of the Information Engineering Laboratory 
 for their companionship over the last three years; to Mr. Frank Serio 
 and Mr. Sam McDowell for help in the project's construction; to 
 Mr. Stan Zundo for the drafting; and to Ms. Cinda Robbins for the typing. 
 Finally, the author would like to express his deepest gratitude to his 
 parents and sister for their continuous support and understanding. 
 
IV 
 
 TABLE OF CONTENTS 
 
 Page 
 
 1 . INTRODUCTION 1 
 
 2. PROPERTIES OF SPEECH 2 
 
 2.1 PHYSIOLOGY OF SPEECH PRODUCTION 2 
 
 2.2 UNVOICED SPEECH 4 
 
 2.3 VOICED SPEECH 4 
 
 3. ORTHOGONAL REPRESENTATIONS 8 
 
 3.1 ORTHOGONAL EXPANSIONS 8 
 
 3.2 FOURIER SERIES EXPANSION 10 
 
 3 . 3 COMPRESSION PROPERTY 11 
 
 4. DIGITAL COMPUTATION 13 
 
 4.1 TIME TRUNCATION 13 
 
 4.2 SAMPLING 14 
 
 4.3 QUANTIZATION 14 
 
 5. BURST PROCESSING 17 
 
 5.1 BURST CONCEPTS 17 
 
 5.2 BURST ENCODING AND DECODING 17 
 
 5.3 BURST MULTIPLICATION 19 
 
 6. BURST FOURIER TRANSFORMER 21 
 
 6 . 1 INTRODUCTION 21 
 
 6.2 FUNDAMENTAL PERIOD DETECTION 21 
 
 6.3 HARMONIC SELF SAMPLING 23 
 
 6.4 SERIAL VS. PARALLEL 28 
 
 6.5 ASYNCHRONOUS PULSE MULTIPLIER 30 
 
 6.6 COEFFICIENT COMPUTATION 34 
 
 6.7 INCREASED COMPUTATION ACCURACY 36 
 
 7. CONCLUSION 44 
 
 REFERENCES 45 
 
 APPENDIX 
 
 CIRCUIT DRAWINGS 47 
 
LIST OF FIGURES 
 
 Figure Page 
 
 1 . Speech Apparatus 3 
 
 2. Voiced Speech Production 6 
 
 3. Converging Approximations 9 
 
 4. Spectral Distortion 15 
 
 5. Block Sum Register 18 
 
 6. Burst Encoder 20 
 
 7. Processor Block Diagram 22 
 
 8. Fundamental Period Detection 24 
 
 9. Harmonic Self Sampling 25 
 
 1 . Transform Unit 27 
 
 11. Serial Implementation 29 
 
 12. Burst Interpolation 31 
 
 13. Asynchronous Pulse Multiplier 32 
 
 14. Coefficient Computation 35 
 
 15. Squaring Connections 37 
 
 16. Square Root Connections 38 
 
 17. Burst Addition 39 
 
 18. Coefficient Logic 40 
 
 19. Increased Accuracy 42 
 
 20. MSE vs Length 43 
 
1 
 
 1. INTRODUCTION 
 
 This investigation deals with the spectral decomposition of speech 
 waveforms. The motivation for such an operation is the applicability 
 to areas such as speech compression. A large body of references on 
 applications of various transforms to speech processing can be found. 
 [9, 10, 11, 15] 
 
 The major shortcoming of transform processing has been the 
 complexity of implementation. A unique solution to the problem 
 is proposed which utilizes advantages present in Burst Processing. f3] 
 The feasibility of using such an unconventional representation is demonstrated 
 and shown to be preferable to conventional binary implementations. The 
 inherent properties of speech have been exploited throughout in an 
 attempt to minimize the hardware. 
 
2 
 
 2. PROPERTIES OF SPEECH 
 
 2.1 PHYSIOLOGY OF SPEECH PRODUCTION 
 
 Speech is the result of voluntary, formalized motions of the 
 respiratory and masticatory apparatus. It is a skill which must be 
 learned and developed. Control is aided by the acoustic feedback of 
 the hearing mechanism. Figure 1 illustrates the parts of the human 
 anatomy relevant to speech production. 
 
 The vocal tract is an acoustical tube which acts as a filter on 
 the excitation functions of speech. It is terminated by the lips on one 
 end and by the vocal cords at the top of the trachea on the other end. 
 The cross sectional area is nonuniform and may be varied by movement 
 of the lips, jaw, tongue, and velum. 
 
 An ancillary path for speech production is orovided by the 
 nasal tract. It extends from the velum to the nostrils. Acoustic 
 coupling between the nasal and vocal tracts is controlled by the size 
 of the opening at the velum. As is well known, nasal coupling can 
 substantially influence the characteristics of the sound produced. 
 
 The source of energy for speech lies in the air flow out of the 
 lungs. As air is forced out, it passes through the trachea into the 
 throat cavity. At the top of the trachea one finds the vocal cords 
 and glottis. It is the degree of activity of the vocal cords which 
 determines whether "voiced" or "unvoiced" speech is produced. 
 
VELUM 
 
 ESOPHAGUS 
 
 NASAL 
 CAVITY 
 
 ORAL 
 
 CAVITY 
 
 TONGUE 
 
 VOCAL CORDS 
 
 TRACHEA 
 
 Figure 1. Speech Apparatus 
 
2.2 UNVOICED SPEECH 
 
 Unvoiced sounds are produced by a turbulent flow of air at some 
 point of stricture in the vocal tract. An acoustic noise is generated 
 which provides an incoherent excitation for the vocal system. The 
 spectrum of the noise near its point of generation is relatively broad 
 and uniform. The vocal cavities forward of the construction are usually 
 the most influential in spectrally shaping the sound. The fact that the 
 vocal cords do not participate in the creation of unvoiced speech is 
 the key observation. 
 
 2.3 VOICED SPEECH 
 
 Voiced sounds are produced by the vibratory action of the vocal 
 cords. The relatively massive tensed vocal cords are initially contiguous. 
 The subglottal pressure is then increased enough to force them apart, 
 producing a lateral acceleration. As the air flow increases, the local 
 pressure is reduced, and the cords are returned toward their original 
 position. As this occurs, the pressure builds up and the cycle is 
 repeated. 
 
 The period of oscillation of the vocal cords is determined by their 
 mass and compliance. This period is usually shorter than the natural 
 period of the cords; thus, it is a forced oscillation. 
 
 The orifice produced by the vibration cords breaks up the steady 
 air flow into short, quasi-periodic pulses of air. These pulses are 
 used to excite the acoustic system above the vocal cords. The volume 
 flow of air through the glottis as a function of time is roughly 
 triangular in shape and exhibits duty factors on the order of 0.3 to 
 0.7. Thus, the qlottal air flow is rich in harmonics and overtones. 
 
A simplified block diagram for the production of voiced sounds 
 is shown in Figure 2. The output signals S (t) appearing at the lips 
 is the convolution of the excitation function e(t), corresponding to 
 the air flow at the vocal cords, with the impulse response of the filter 
 representing the vocal tract. 
 
 S v (t) = /^ e(t) v (t-k) dk (2.1) 
 
 In the frequency domain, this corresponds to the product 
 
 S v (f) = E(f) • V(f) (2.2) 
 
 The amplitude spectrum of the speech signal is obtained by taking the 
 magnitudes of the functions. 
 
 |S y (f)| = |E(f)| . |V(f)| (2.3) 
 
 This process may also be considered from a Fourier decomposition 
 
 point of view. Writing the source signal as 
 
 H 
 C = l A, cos(hFt + e. ) (2.4) 
 
 v h=l 
 
 we consider H audible harmonics, each with its own amplitude A, 
 
 frequency hF (F = 1/T - fundamental frequency), and phase 0^. Information 
 
 is transmitted through the following modulation processes of the vocal 
 
 tract: 
 
 1) Starting and stopping of the source - represented 
 by the function s(t) . 
 
 2) Variation of the instantaneous fundamental frequency 
 represented by replacing Ft with F/ Q i (t) dt, 
 where i(t) is the inflection factor. 
 
 3) Filtering effects of the vocal tract represented 
 by v(t). 
 
A 
 
 
 
 
 1- 
 u 
 
 ac 
 
 
 < 
 
 UJ 
 
 ^^ 
 
 ac 
 
 H 
 
 «^ 
 
 
 _l 
 
 *^ 
 
 _i 
 
 — 
 
 > 
 
 < 
 
 li. 
 
 
 o 
 
 o 
 
 > 
 
 0> 
 
 
 
 (0 
 
 z 
 
 z 
 
 Q 
 
 o 
 
 o 
 
 O 
 
 mmm 
 
 ■« 
 
 K 
 
 »- 
 
 O 
 
 < 
 »- 
 
 o 
 
 _l 
 
 
 z 
 
 < 
 
 O 
 
 3 
 
 o 
 
 X 
 
 u. 
 
 o 
 
 UJ 
 
 
 > 
 
 o 
 
 •r™ 
 
 +■> 
 O 
 
 o 
 
 s- 
 
 Q. 
 
 o 
 
 QJ 
 CD 
 Q- 
 00 
 
 -o 
 cu 
 o 
 
 CM 
 O) 
 
 s- 
 
 3 
 CD 
 
 Cft 
 CD 
 
 z 
 
 3 
 
7 
 In normal voiced speech, all three factors are present simultaneously, 
 giving a wave form represented by 
 
 H t 
 
 S y (t) = s(t) I v(t) A h cos(hF/Q i(t) dt + © h ) (2.5) 
 
 h=l 
 As. stated previously, the amplitude spectrum of the speech signal, 
 represented by |S (f)|, is obtained by taking the magnitude of the 
 transform of S (t) . 
 
8 
 
 3. ORTHOGONAL REPRESENTATIONS 
 3.1 ORTHOGONAL EXPANSIONS 
 
 A set p of arbitrary functions is said to be orthogonal over the 
 interval t, <t<t ? if 
 
 t 9 c i = j 
 
 / P,(t) P.(t) dt = (3.1) 
 
 t 1 ' J i f j 
 
 If the constant c is equal to one, the set of functions is said to be 
 orthonormal . 
 
 Suppose S (t) is a real valued function defined on the interval 
 (t, , t ). It can be represented by the expansion 
 
 oo 
 
 S(t) = E a P (t) (3.2) 
 
 v h=0 h h 
 
 To evaluate the k coefficient a. , one multiplies both sides of Eq. 
 
 (3.2) by P k (t) and then integrates over the interval (t, , t ? ). 
 
 t t °° 
 
 / 2 S (t) P k (t) dt = / 2 E a h p h (t) P k (t) dt (3.3) 
 t, t, h=0 
 
 Applying Eq. (3.1) to Eq. (3.3) we obtain 
 
 a h = 1/c / c S v (t) P k (t) dt (3.4) 
 
 fc l 
 
 S (t) may be approximated by limiting the series in Eq. (3.2) to 
 the first H terms. The amount of distortion introduced by this 
 approximation depends on the characteristics of the function. An example 
 of converging approximations is shown in Figure 3 with P = [sin x, 
 (sin 3x)/3, (sin 5x)/5, (sin 7x)7]. 
 
Figure 3. Converging Approximations 
 
10 
 3.2 FOURIER SERIES EXPANSION 
 
 If one considers the special case where P = [1, sin 2At/T, 
 cos 2At/T, sin 4At/T, cos 4At/T,... cos hAt/T 1 with t over the interval 
 [0,T], the Fourier Series is obtained with coefficients a, and b, as 
 defined below. 
 
 a. = 2/T / S (t) sin ■ 2ir T ht dt 
 h v T 
 
 b. = 2/T / S (t) cos 27r T ht dt 
 n v i 
 
 (3.5) 
 
 where S (t) is defined as 
 
 S ft) = E [b. cos -^- + a. sin -*£!_] (3.6) 
 v . _-. hi hi 
 
 There is a large body of information describing the various properties of 
 Fourier Series. [14] 
 
 The magnitude of the Fourier coefficient for the h harmonic 
 is defined as 
 
 |S v (h)| = A 2 h + b 2 h (3.7) 
 
 J.L. 
 
 This quantity describes the contribution of the h harmonic to the 
 overall signal. Application of such information can be found in a 
 variety of fields, most notably that of signal processing. 
 
11 
 
 3.3 COMPRESSION PROPERTY 
 
 It is a well known fact that orthogonal transformation of signals 
 offer a potential reduction in the bit rate necessary for transmission. 
 [1] This ability follows directly from the fact that the magnitudes of 
 the orthogonal coefficients are a strong function of their order. Most 
 of the information is concentrated in the lower coefficients. Thus 
 the number of bits required to represent each coefficient can vary. 
 
 If compression is to be achieved using orthogonal transformations 
 when compared to standard pulse code modulation (PCM), the average 
 number of bits per coefficient must be less than the number of bits 
 per PCM sample. However, the signal to noise ratio must be kept 
 constant to allow comparison. 
 
 The necessary relationships between the number of bits per 
 coefficient and the variance of the coefficient have previously been 
 derived. [1,2] Once the variances of the coefficients are measured, 
 the required number of bits can be computed. This was done using 
 samples of speech for the first 16 coefficients of three different 
 transformations — Fourier, Hadamard, and Karhunen-Loeve. [1] 
 
 The results may be summarized by listing the transformations 
 in order of decreasing performance: Karhunen-Loeve, Fourier, and 
 Hadamard. To equal the SNR of standard 56 Kbit/sec PCM, the Karhunen- 
 Loeve transform required 42.5 Kbit/sec, Fourier required 46 Kbit/sec, 
 and the Hadamard required 48.5 Kbit/sec. The maximum difference of 
 6 Kbit/sec between transforms represents only a 14% increase. When 
 considering the complexity of implementation, one might consider such a 
 small degradation acceptable. 
 
12 
 Comparisons of signal-to-quantizing-noise ratios for the three 
 transforms at various bit rates are also known. [1] The relative 
 performance is the same as observed earlier. Thus, one can conclude 
 that bit-rate savings can be achieved at the expense of increased processing 
 complexity required by the orthogonal transformations. An alternate 
 result is that such transformations will allow increased SNR for a 
 fixed bit rate. 
 
13 
 
 4. DIGITAL COMPUTATION 
 Often it is desirable to evaluate Eq. (3.7) using digital technology. 
 If one chooses to enter the digital domain, Eq. (3.5) and Eq. (3.7) 
 can never be evaluated precisely. The main factors preventing infinite 
 precision are 
 
 1) S (t) is observed through a finite time window 
 (time truncation) 
 
 2) S (t) is sampled at discrete instants in time 
 
 3) S (t) is quantized to a fixed number of levels 
 
 4) Any machine possesses only finite precision 
 
 4.1 TIME TRUNCATION 
 
 A machine can only deal with a finite portion of a signal at any given 
 time. Considering this window through which it sees the world, its effect 
 is to limit the frequency resolution of the analysis. If the window is 
 T seconds long, only spectral components 1/T Hz can be resolved. 
 
 The Fourier transform of the unit amplitude data window is of the 
 form sin x/ x. If a sinusoidal input of frequency f n is considered, the 
 spectrum obtained would be 
 
 Sin (ir(f-f )T) 
 
 Tr(f-f JT 
 
 In the processor to be described in section 6, a rectangular window 
 of size T is always used. One can use this fact to analyze the amplitude 
 distortion introduced on the resultant spectrum. If one considers the 
 input speech to have a flat spectrum starting at F, a convolution of this 
 spectrum with the sin x/x response of the window can be performed. The 
 
14 
 
 result, shown in Figure 4, indicates the amount of distortion one may 
 expect. Since this distortion is stationary with respect to F, it can 
 easily be compensated for before further processing is undertaken. 
 
 4.2 SAMPLING 
 
 Ideal sampling involves observing a signal only at discrete 
 instants in time. Usually, these samples are equally spaced in time - 
 separated by At seconds. The sampling function is represented in the 
 Fourier domain as a train of impulses of strength At, each 1/At apart. 
 Multiplying the input signal by the sampling function corresponds to a 
 convolution of their transforms. This will amount to repeating the input 
 signal's transform around each multiple of 1/At. It is well known that 
 if the sampling rate is at least twice the highest frequency present 
 in the original signal, so-called aliasing will be prevented. 
 
 In section 3.2 we defined the Fourier coefficients for continuous 
 signals - Eq. (3.5). Considering a sampled signal, one can define the pair 
 of Discrete Fourier Coefficients 
 
 d-1 2 TrhkAt 
 
 T 
 
 At = T/d (4.1) 
 
 a. = E S (kAt) sin 
 n k=0 v 
 
 d-1 2uhkAt 
 
 b = £ S (kAt) cos T 
 h k=0 v 
 
 where d is the number of points in the discrete transform. 
 
 4.3 QUANTIZATION 
 
 To represent an input sample with continuous values using a finite 
 precision machine, the sample value must be mapped into a representable 
 value. The noise introduced in this process is due to the fact that many 
 input values are mapped into one output value. It is a well known fact 
 
15 
 
 I" 
 
 Figure 4. Spectral Distortion 
 
16 
 that if the quantization error is treated as zero mean white noise, the 
 noise power produced is of the form 
 
 N q ■ q 2 /12 (4.2) 
 
 where q is the step size. 
 
 This noise power enables one to determine the smallest input 
 component discernable in the quantizers spectrum. The quantizer output 
 must have a spectral density that exceeds Nq. One can define the Dynamic 
 Range of the quantizer as 
 
 DR = -10 log 1Q [q 2 /12] db (4.3) 
 
 For 16 levels of quantization, a Dynamic Range of 34.9 db is observed. 
 Alternately, one can define the mean squared error (MSE) as 
 
 MSE = 10 log 10 [q 2 /12] db (4.4) 
 
 This is the difference between the spectra of the input and output of the 
 quantizer. Obviously, for 16 levels, there is a MSE of -34.9 db. 
 
 If one assumes a uniform spectrum for the speech signals with H 
 
 r r ■ 3 max 
 
 harmonic components at the input, a worst case signal to noise ratio can 
 
 be derived for each Fourier coefficient. Using the fact that q=l/b, consider 
 
 the case when all H components contribute an equal amount. The signal 
 
 max 
 
 to noise ratio per coefficient becomes 
 
 2 
 SNR = lQ- 9 (4.5) 
 
 H 
 max 
 
 Under these assumptions, if H Fourier coefficients are used to represent 
 
 the speech signal, the total signal to quantization noise ratio becomes 
 
 SNR T = f4 H i H max < 4 - 6 > 
 
 max 
 
17 
 5. BURST PROCESSING 
 
 5.1 BURST CONCEPTS 
 
 It has previously been the accepted practice to represent quantized 
 signals as binary data words. Such a PCM scheme requires log b bits for 
 be levels of quantization. In 1974, an alternative was proposed by 
 W. J. Poppelbaum. [3] Instead of representing b levels in a binary 
 fashion, it was proposed to utilize a unary scheme and represent the 
 b levels with b equally weighted bits (Burst digits). Such a reduction 
 in precision may be counteracted by appropriate averaging. [16] 
 
 During the past two years, members of the Information Engineering 
 Laboratory of the University of Illinois have been investigating the 
 properties and applicability of such a representation. Designed as a 
 compromise between stochastic processing and weighted binary, Burst exhibits 
 simplicity and acceptable accuracy for applications where time averaging 
 is allowed. The hardware complexity of Burst is an order of magnitude 
 greater than that of stochastics. Howver, it is an order of magnitude 
 less than that of weighted binary. Applicable areas include AM demodulation, 
 FM demodulation, and video transmission. [4,5,6,7,8] 
 
 5.2 BURST ENCODING AND DECODING 
 
 The digital encoding of an analog signal into the Burst domain is 
 quite simple. Many variations of encoders have been demonstrated. [8] 
 The fundamental building block common to all schemes is the Block Sum 
 Register (BSR), shown in Figure 5. Consisting of a b-bit shift register 
 connected to b current sources, this particular implementation uses 
 negative logic. Each current source is activated by a in the 
 corresponding bit position. The total current is summed on a common 
 bus producing a quantized-analog output. 
 
18 
 
 ♦ i 
 
 D 
 O 
 
 > 
 
 • — -wv- 
 
 CO 
 CD 
 
 
 s- 
 
 
 <D 
 
 
 +-> 
 
 
 00 
 
 
 •r— 
 
 
 o> 
 
 
 0) 
 
 
 Q£ 
 
 
 E 
 
 (C 
 
 3 
 
 < 
 
 00 
 
 Ul 
 
 .V 
 
 _l 
 
 o 
 
 o 
 
 o 
 
 
 r— 
 
 
 CO 
 
 * 
 
 LD 
 
 o 
 
 
 o 
 
 O) 
 
 -I 
 
 4- 
 
 o 
 
 3 
 
 
 CD 
 
 
19 
 A Burst encoder may be implemented as shown in Figure 6. The analog 
 signal is compared to a staircase waveform generated by a BSR. If the 
 analog input is greater than the present value of the staircase, a 1 is 
 produced at the output; otherwise a is produced. Thus, after b clock 
 periods, a new Burst sample is produced. It is compacted in the sense that 
 all ones are adjacent to each other at one end of the sample. If the 
 BSR uses negative logic, the two inputs of the comparator are switched. 
 
 It is obvious that the number of ones produced is directly proportional 
 to the magnitude of the input signal. The step size q of the staircase 
 is dependent on the maximum amplitude of the analog signal. It is chosen 
 so that the peak-to-peak variation of the input rarely exceeds bq. The 
 effects of not using a sample-and-hold at the signal input have previously 
 been discussed. [5,8] For improved performance, one may elect to use a sample- 
 and-hold at the analog input. 
 
 5.3 BURST MULTIPLICATION 
 
 Burst multiplication may be implemented in the digital or quasi- 
 analog domain. The latter implementation was chosen for reasons which 
 will become obvious later. Referring to Figure 5, the voltage V serves 
 as a weighting factor for the stored Burst. Increasing V will increase 
 the quantized analog value present on the current summing bus. Thus, 
 multiplication can be performed without any increase in digital hardware. 
 
 This key result is critical to the hardware realization to be 
 presented. It is well known that the complexity of conventional FFT 
 processors using binary representation is largely due to the required 
 multiplications and additions. [12] It will be shown that Burst allows 
 such operations to be performed in a highly parallel manner. 
 
o 
 
 LLl 
 
 I- 
 
 o 
 
 < 
 
 Q. 
 O 
 
 o 
 
 en 
 
 cn 
 cc 
 
 3 
 CD 
 
 20 
 
 s- 
 
 T3 
 O 
 
 a 
 
 to 
 
 S- 
 
 CO 
 
 
6. BURST FOURIER TRANSFORMER 21 
 
 6.1 INTRODUCTION 
 
 Given the previous background information, a detailed description 
 of the prototype machine is possible. Figure 7 shows a general block 
 diagram of the processor. The speech signal enters an analog front end 
 which performs two functions. The signal is initially passed through 
 an amplifier with a gain of 2.5 to obtain a signal capable of being 
 processed. Since the signal is locally accessible, it was decided to 
 use automatic gain control instead of adaptive encoding. The amplified 
 signal enters an AGC circuit and also the pitch detection circuit 
 described in section 6.2. 
 
 The Asynchronous Pulse Multiplier (APM) generates the appropriate 
 sampling clock given the beginning of each fundamental period. This 
 clock is used to drive the transform unit which performs the multiplica- 
 tions indicated in Eq. (4.1). The resulting coefficients are then 
 used to compute the magnitude of the spectral component. 
 
 6.2 FUNDAMENTAL PERIOD DETECTION 
 
 The problem of detecting the fundamental pitch period of speech is 
 highly complex. In fact, a complete solution is yet to be found. The 
 main difficulty is that voice pitch is not a clearly defined attribute. 
 Precisely what epochs of the speech waveform should be chosen for period 
 measurement is not clear. 
 
 Most pitch extraction methods attempt to identify the epoch of each 
 glottal puff. Describing the periodicity of the signal, inverse 
 filtering techniques, or measuring the fundamental component are common 
 approaches. The most promising of these is the so-called cepstrum 
 technique. [10] However, the complexity of such an approach is overwhelming 
 for many applications. 
 
22 
 
 >- 
 
 
 
 < 
 
 
 
 _J 
 a. 
 
 V) 
 
 
 
 
 
 
 a 
 
 
 
 
 i 
 
 i 
 
 
 
 
 + 
 
 
 
 
 
 
 
 
 _l 
 
 
 
 
 
 CJ -c 
 
 
 
 
 O 
 
 ^ 
 
 
 
 
 OC 
 
 z 
 
 Q 
 
 i 
 
 I i 
 
 i 
 
 
 
 o 
 
 _) 
 
 a 
 
 
 _ JZ 
 
 a 
 
 
 UJ 
 
 z 
 < 
 
 Ql 
 
 
 
 "6- 
 
 
 
 2 
 
 
 
 
 
 tr 
 
 
 
 
 
 NSFO 
 UNIT 
 
 
 
 , 
 
 
 
 
 1 
 
 
 < 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 CO 
 
 o F. 
 
 
 
 ( 
 
 , 
 
 
 
 — ■ . 
 
 
 Z w 
 OC m . CL 
 
 
 
 
 
 
 • 
 
 
 
 CO 
 
 
 2»-^ 
 
 
 
 
 
 < 
 
 
 C9 
 
 £=[ 
 
 
 i 
 
 1 
 
 
 NALOi 
 RONT 
 END 
 
 ^ 
 
 
 
 
 
 
 
 < *- 
 
 
 
 i 
 
 I 
 
 
 
 
 
 
 +4 
 
 
 
 *— *• 
 
 
 
 
 in 
 
 
 
 
 
 
 
 
 
 <T3 
 
 S- 
 CD 
 
 o 
 
 o 
 
 CO 
 
 s_ 
 o 
 (/) 
 (/) 
 cu 
 (J 
 o 
 s- 
 
 CD 
 
 en 
 
23 
 
 The pitch detection method implemented takes advantage of the rapid 
 initial rise of the speech waveform. A search waveform performs a linear 
 search for the speech peak. In order to remove the effects of transients, 
 the search waveform is reset after a delay of 0.7 ms. The reset level is 
 set to a fixed quantity above the peak of the speech. Thus, the search 
 waveform will track the speech amplitude. 
 
 At the point of each detection, an output pulse is generated to 
 signify the beginning of the pitch period. A pulse duration of 2.4 ms 
 is used to mask off possible retriggering by peaks within the same 
 fundamental period. Such a mask performs the function of a low pass 
 filter with a cutoff frequency of 417 Hz. Figure 8 shows an actual 
 trace of the circuit in operation. 
 
 Personal observations indicate a high degree of tracking. The 
 problems which may be introduced by the changing phase of the signal may 
 cause a frequency modulation effect. Although this may be slightly 
 bothersome, it will not prevent intelligibility. 
 
 6.3 HARMONIC SELF SAMPLING 
 
 The problem of convolving the speech signal S (t) with sin hwt and 
 cos hwt for the h harmonic is fundamental to the calculation of 
 |S (t)|. An alternate approach is the idea of Harmonic Self Sampling. [17] 
 If one divides a given period of S (t) into h+1 equal segments, only one 
 period of a sine and cosine waveform is required. One merely performs 
 h+1 partial convolutions of each segment with the sine and cosine. 
 Summing over these partial convolutions and scaling appropriately, one 
 obtains the coefficients a, and b.. This is illustrated in Figure 9. 
 Using this idea, Eq. (4.1) now becomes 
 
24 
 
 c 
 o 
 
 o 
 
 O) 
 4-> 
 
 QJ 
 Q 
 
 T3 
 O 
 •r— 
 
 s- 
 
 (T3 
 +-> 
 
 C 
 
 o> 
 
 E 
 fO 
 
 -a 
 
 c 
 
 ZJ 
 
 W * 
 * V 
 
 00 
 
 O) 
 
 S- 
 
25 
 
 
 Q. 
 
 E 
 
 (T3 
 
 + 
 
 CD 
 
 c 
 o 
 
 (V 
 
 cr> 
 
 
 en 
 
26 
 
 _ . 1 *1 d : ] Q ,nT kAt^ . 2^kAt 
 
 . 1 J d I 1 c ,nT . kAtx „ rte 2TrkAt 
 
 h ' ^ n=0 k=0 ^ "h" ~^ 
 
 (6.1) 
 
 where At = T/d. 
 
 The motivation behind such an approach is that the weighting voltages 
 on a row of BSR's can be adjusted to simulate a given waveform. The 
 transform unit, shown in Figure 10, consists of two rows of 32 BSR's, one 
 row weighted with a sine wave, the other weighted with a cosine wave. By 
 adjusting the input sampling rate appropriately, these voltages remain 
 stationary. Each Burst encoded subsection of speech is passed through 
 these two rows. After the complete subsection is present, the current 
 output is observed. 
 
 Using this implementation, the hardware complexity of standard 
 Fourier transformers is circumvented. Two rows of BSR's with appropriate 
 voltage sources replace the required complex arithmetic units. Storage 
 elements are required independent of the type of processing techniques 
 implemented. Using weighted binary, each of the registers requires log b 
 bits. However, additional storage is required for the complex constants 
 involved. The indexing and control hardware required for the complex 
 arithmetic unit must also be considered. [12] In comparison, the 
 increase in hardware needed to perform the required convolutions in the 
 Burst implementation is almost negligible. 
 
 Due to these parallel multiplications and additions, the number of 
 computations is also reduced to a minimum. With regard to Eq. (6.1), the 
 inner summation is performed in one step. Thus, there are order of h 
 computations for the h harmonic and total of (H+l)(H+2)/2 computations 
 for a complete spectrum of H+l harmonic lines. 
 
27 
 
 ■ j= 
 o 
 
 - -C 
 -Q 
 
 •-AAA^-O 
 
 •-AAAM^ 
 
 M/W-4 
 
 E 
 
 S- 
 
 o 
 
 10 
 
 c 
 
 ro 
 
 S- 
 
 
 
 -e- 
 
 
 
 
 
 
 cr 
 
 
 
 H UJ 
 
 
 
 GO Q 
 
 
 
 or o 
 
 
 
 3 o 
 
 
 
 OD Z 
 
 
 
 UJ 
 
 
 w 
 
28 
 The time delay involved in the calculation approaches zero. As data 
 serially enters the processor, the required partial convolutions are 
 performed on-line and the results are accumulated. After the final 
 convolution, |S (h)| is computed. The time required for this computation 
 is the total delay encountered. 
 
 6.4 SERIAL VS. PARALLEL 
 
 Due to the highly redundant nature of speech, if one is only 
 interested in a small number of coefficients, a single coefficient may 
 be computed each fundamental period. For H coefficients, this would 
 require H periods, as shown in Figure 11. Assuming the use of a d-point 
 transform unit, with each point consisting of b bit Bursts, we obtain 
 the following results. For a given harmonic h (h=0 to 7), the fundamental 
 period T is divided into bd(h+l) samples. Thus, the input sampling rate 
 is (bd(h+l))/T samples per second. 
 
 The output, consisting of a number of spectral lines (8 in this 
 implementation), is pitch synchronous. Using a range of 50 Hz to 
 250 Hz for the fundamental period, one obtains a rate of 6.25 spectra 
 per second to 31.25 spectra per second. This corresponds to 800 to 4000 
 Burst digits per second, or an equivalent 200 to 1000 binary digits per 
 second. 
 
 If one rejects the serial approach, a parallel analysis may be 
 implemented. The idea of partial convolutions can still be used at the 
 expense of added hardware. If one is interested in the first H+l 
 harmonics, H+l data streams must be maintained in parallel. This implies 
 H+l transform units, coefficient computation hardware, and APM's. 
 
 A more subtle approach is also possible. Fix the input sampling rate 
 at (bd(h+l))/T. Thus, a H and b H are obtained directly from the sampled 
 input stream. To obtain the a.'s and b.'s for h-0 to H-l, one may 
 
29 
 
 o 
 
 4-> 
 
 c 
 E 
 
 si 
 
 <1J 
 
 CD 
 
30 
 interpolate the waveform from the known samples. This is shown in 
 Figure 12. Defining t, as the time between sample points for harmonic 
 h, one observes the following relations: 
 
 t fi = (8/7) t ? t 3 = (8/4) t 7 
 
 t 5 = (8/5) t ? t 2 = (8/3) t 7 
 
 t 4 = (8/5) t ? t ] ■ (8/2) t ? 
 
 t Q = (8/1) t ? 
 
 Linear interpolation is well suited to Burst processing. [6] If 
 one slides a window between two Bursts, one observes an interpolation 
 between the two known values. This results from the unary properties 
 of Burst. Figure 12 demonstrates this interpolation. To perform the 
 various convolutions in parallel, one need only use these interpolations 
 as the necessary sample points which are passed through the H+l 
 transform units. 
 
 In this prototype, the serial approach was chosen for hardware 
 implementation. It was felt that speech does exhibit enough redundancy 
 to allow a serial computation. Hardware costs were also a factor in 
 the design. 
 
 6.5 ASYNCHRONOUS PULSE MULTIPLIER 
 
 Harmonic self sampling requires a pitch synchronous, variable rate 
 clock. The speech input must be sampled at a rate dependent on two 
 parameters: T, the fundamental period of the speech; and h, the 
 harmonic being computed. If the transform unit consists of 32 points, 
 each 16 bits in length; 512 pulses must be inserted in the fundamental 
 period. This is accomplished using the design shown in Figure 13. 
 
 Given pulses indicating the beginning of each fundamental period, 
 the APM measures the present fundamental period and uses this value as 
 
 
31 
 
 h 
 
 7 
 
 L i 
 
 
 
 i 
 
 
 i 
 
 i 
 
 
 i 
 
 • 
 
 
 
 .-'— i 
 
 1, i 
 
 
 
 
 i 
 
 
 i 
 
 t 
 
 
 i 
 
 
 
 i 1 
 
 
 i 
 
 
 
 1 
 
 
 i 
 
 
 i 
 
 
 
 i 
 
 
 
 
 i 
 
 
 
 i 
 
 
 • 
 
 
 
 i 
 
 
 
 1 i i i 1 
 
 1 i(l 
 
 
 9 8 7 6 5 
 
 7 IIIIIIIIIOIIMIIIIOOIIMIIIOOOIIIIIIOOOOIIIIIOOOOO 
 
 8 . , 7 , , 6 , , 
 
 6 
 
 » 
 
 5 
 
 '9 | 
 
 4 
 
 9 , 
 
 3 
 
 9 . 
 
 2 
 
 9 . 
 
 1 
 
 9 
 
 
 
 9 . 
 
 1 i i 7 i i L 
 
 Figure 12. Burst Interpolation 
 
32 
 
 o. 
 
 0) 
 
 Q. 
 •r- 
 +-> 
 
 13 
 
 O) 
 00 
 
 3 
 
 t/> 
 
 O 
 
 E 
 O 
 
 S- 
 
 .c 
 o 
 
 E 
 
 to 
 
 <1J 
 
 s- 
 
 cn 
 
33 
 
 an estimate of the next period. Although not essential to the basic 
 concept, this technique eliminates the necessity of delaying the input 
 waveform by one period. 
 
 A standard method of frequency multiplication is to use a phase- 
 locked loop on the harmonics of a clock. Due to the inertia present 
 in such a method, it was rejected for a more direct approach. Using a 
 high speed time reference, , it is divided by 512 and by h, the 
 harmonic to be computed next. This is used to drive the a-counter 
 which measures the time T. During the next fundamental period, the 
 a-counter is compared to a counter driven directly from . Each time 
 this clock counter equals the statisized a-counter, a pulse is generated 
 and the clock counter is cleared. A pair of counters (a,b) are utilized 
 so that one value is staticized while the next period is being computed. 
 
 The clock being implemented in 10000 series ECL circuitry, is 
 72 MHZ. Assuming a maximum fundamental period of 250 Hz, the fundamental 
 period will be estimated to within 0.17% for h equal to 0, and to within 
 1.4% for h equal to 7. 
 
 A simulation study was undertaken to determine the accuracy of the 
 period estimation for various fundamental periods. Choosing the period 
 values at random, an estimate was computed for the eight harmonics. The 
 relative error averaged over the eight results for each fundamental 
 period tested is shown in Table 1. The error is obviously not a strictly 
 increasing function of frequency. Since we are essentially performing an 
 integer division of the period, there are values relative to which have 
 varying truncation errors. This will account for the local discontinuities 
 It is noteworthy that for a fundamental as high at 1152 Hz, an average 
 error of only 1.44% is observed. 
 
34 
 
 Fundamental Peri 
 
 od (Hz) 
 
 Average Error (%) 
 
 39.6 
 
 
 .06 
 
 79.9 
 
 
 .06 
 
 115.2 
 
 
 .13 
 
 144.0 
 
 
 .11 
 
 195.8 
 
 
 .24 
 
 246.2 
 
 
 .23 
 
 281.2 
 
 
 .30 
 
 303.8 
 
 
 .21 
 
 360.0 
 
 
 .37 
 
 426.2 
 
 
 .23 
 
 524.8 
 
 
 .59 
 
 600.5 
 
 
 .46 
 
 655.2 
 
 
 .75 
 
 720.0 
 
 
 .63 
 
 818.6 
 
 
 1.01 
 
 1023.1 
 
 
 1.4 
 
 1152.0 
 
 Table 1. 
 
 1.44 
 
 6.6 COEFFICIENT COMPUTATION 
 
 Given the results of the partial convolutions, the operations 
 indicated in Eq. (3.7) must be performed. A block diagram describing 
 the required operations is shown in Figure 14. The partial convolutions 
 for a given fundamental period are summed together in a counter. The 
 result is normalized with respect to h+1 , the number of convolutions 
 performed. The result must then be squared, summed with the corresponding 
 sin/cos coefficient, and then the magnitude of the spectral line is 
 produced by taking the square root. 
 
35 
 
 < 
 
 or 
 
 LlI 
 
 Z 
 
 O 
 u 
 
 M 
 
 _l 
 
 O 
 
 z 
 
 co o 
 
 -M 
 n3 
 +J 
 
 3 
 Q. 
 E 
 O 
 <_> 
 
 a 
 
 •r— 
 4- 
 <4- 
 <D 
 O 
 C_) 
 
 CL) 
 
 QC 
 
 t- LlI 
 
 in a 
 
 £T O 
 
 3 O 
 
 CD Z 
 
 UJ 
 
36 
 
 The implementation of squaring, adding, and obtaining the square root 
 is based on the unary properties inherent to Burst processing. Observing 
 the value of a compacted Burst, the information is contained in the 
 location of the 1-0 boundary. It is a positional attribute. This property 
 lends itself to trivial function implementations. By correctly connecting 
 the outputs of a Burst register to predetermined inputs of a second 
 register, the contents of the receiving register will contain the Burst 
 approximation to the function. Figures 15 and 16 show a squaring and 
 square root implementation using this idea. One notices that appropriate 
 scaling is necessary. 
 
 Burst addition may be implemented in several ways. [4,5] The method 
 chosen consists of observing the odd pulses of the addend and the even 
 pulses of the augend. The output, as shown in Figure 17, is a scaled 
 approximation of the desired result. It is not possible to guarantee a 
 compacted output, so one must perform compaction before further processing 
 is allowed. Combining the three function generators to obtain the spectral 
 coefficient, one arrives at the logic depicted in Figure 18. The squaring 
 and addition connections are combined in one step. The final result is 
 routed to the appropriate output display. 
 
 6.7 INCREASED COMPUTATION ACCURACY 
 
 The computations described in the previous section were implemented 
 with 16 Burst digit accuracy. Assuming a uniform probability distribution 
 for the possible input values, a mean square error of .065 was obtained 
 for the squaring operation; .064 for the square root operation, and 0.25 
 for the addition. Using a uniform input distribution, the mean square 
 error for the combined operations using 16 bits is .446. This should be 
 regarded as an upper bound. It is generally accepted that speech exhibits 
 a near Gaussian distribution. Such a distribution would effectively 
 reduce this mean square error. 
 
37 
 
 LU 
 
 og 
 
 m * 
 cm ^r 
 
 O <r 
 
 O * 
 
 CM <t 
 
 o to 
 
 m ^ 
 CM <J- 
 
 o <r 
 
 m u) 
 
 CM O 
 
 q 
 
 6 
 
 \<0 
 
 o o 
 
 o -• 
 
 — CM 
 
 cm to 
 
 <j- m 
 
 10 CD 
 
 <r> 3 
 
 CM *)• 
 
 f— I f— 1 
 
 (0 
 
 X 
 
 O i-i 
 
 rvj ro 
 
 *r m 
 
 10 f- 
 
 CD <J> 
 
 O - 
 
 CM (O 
 
 *- m 
 
 (0 
 
 to 
 
 c 
 o 
 
 •r- 
 ■M 
 O 
 
 ai 
 
 C 
 C 
 
 o 
 
 c 
 
 •r— 
 
 s- 
 cr 
 
 lo 
 
 
38 
 
 
 o 
 
 o 
 
 ^ N 
 
 O <D 
 
 «j 5! 
 
 r-4 O 
 
 m f- 
 
 <r * 
 
 ro 
 
 en 
 
 o 
 
 UJ 
 
 6 
 
 d 
 
 ro O 
 
 6°. 
 
 ■ * 
 
 ^d 
 
 fO CO 
 
 — *■ 
 
 q 
 
 *; 
 
 d 
 
 X 
 
 to 
 
 o 
 
 <a- 
 
 ir> r«- 
 
 CD 0") 
 
 o -> 
 
 ^ (\J 
 
 fO ro 
 
 <r c 
 
 m 
 
 to 
 
 (0 
 
 f— 1 
 
 
 
 
 
 
 
 
 
 
 
 
 S- 
 
 
 
 
 
 
 
 
 
 
 
 
 X 
 
 o 
 
 r-l 
 
 CM ro 
 
 *t in 
 
 ID h- 
 
 GO <T> 
 
 O — 
 
 i-i >-« 
 
 c\i ro 
 
 
 
 <0 
 
 ■-4 
 
 -M 
 U 
 OJ 
 
 c 
 c 
 o 
 o 
 
 o 
 o 
 <x. 
 
 oj 
 
 CT 
 00 
 
 <£> 
 
 CD 
 
 O) 
 
 X 
 
39 
 
 (\j 
 
 ■o 
 ■o 
 
 +-> 
 
 CO 
 
 $- 
 
 13 
 CO 
 
 
 cr> 
 
40 
 
 
 
 O 
 
 a) 
 
 z 
 
 
 
 O 
 
 
 
 £ 
 
 o 
 
 
 
 Ul 
 
 o 
 
 
 
 cr 
 
 Ui 
 
 
 
 < 
 
 z 
 
 
 
 3 
 
 z 
 
 
 o 
 
 O 
 
 o 
 
 
 z or 
 
 a) 
 
 o 
 
 . or 
 
 PACTI 
 GISTE 
 
 
 -> 
 
 3" 
 
 
 s 
 
 2 LJ 
 
 
 
 o JjJ 
 
 O or 
 
 
 
 u or 
 
 o 
 
 
 
 
 x. 
 
 
 
 
 o 
 
 
 
 
 o 
 
 
 
 
 _l 
 
 
 
 
 a 
 
 
 
 
 r 
 
 UJ 
 
 _l 
 
 < 
 Z 
 UJ 
 
 > 
 
 o 
 
 UJ 
 
 ^ 
 
 o 
 
 Q 
 < 
 
 CO 
 
 u7£ 
 < z 
 
 3 Z 
 O O 
 CO o 
 
 ^7 
 
 
 
 
 . . . 
 
 
 00 x 
 
 
 ^ Z) 
 
 
 g* 
 
 
 UJ 
 
 
 io Q 
 
 
 a: 
 
 ui 
 
 CD 
 
 Z 
 
 z 
 o 
 
 or 
 < 
 
 i 
 
41 
 Given any input distribution, one may reduce this computational error 
 arbitrarily close to zero. Assuming a fixed number of bits for the input 
 and output value, one may increase the number of bits used in the 
 intermediate calculations. The principles described in the previous 
 section remain valid. Figure 19 demonstrates this for the case of 10 bit 
 input/output values and 20 bit function evaluations. 
 
 A simulation study has shown that the MSE decreases in an approximate 
 exponential manner with increasing bit length. The results are shown 
 graphically in Figure 20. Obviously we do not have a smooth function. 
 One observes large discontinuities for lengths of 19, 24, and 43 bits. 
 These values should be considered when making improvements. 
 
42 
 
 o 
 
 «— 1 
 
 
 
 ^ 
 
 0) 
 
 \ 
 
 
 \ 
 
 GO 
 
 \ 
 
 
 ■N. 
 
 f^- 
 
 \ 1 
 
 
 \ 
 
 tf> 
 
 \ I 
 
 
 \ l\ 
 
 
 in 
 
 
 o 
 
 .— 1 
 
 
 <» 
 
 <t 
 
 GO 
 
 
 h- 
 
 ro 
 
 CD 
 
 
 in 
 
 -' 
 
 CVJ 
 
 <fr 
 
 
 rO 
 
 i—i 
 
 CVJ 
 
 
 i—t 
 
 (M 
 
 xi2 
 
 o 
 
 i— i 
 
43 
 
 .20 - 
 
 • • 
 
 .15 - 
 
 15 
 
 20 
 
 25 
 
 -L 
 
 _!_ 
 
 30 35 
 
 LENGTH 
 
 40 
 
 45 
 
 50 
 
 Figure 20. MSE vs Length 
 
44 
 7. CONCLUSION 
 
 A real time speech analyzer using Burst Processing has been implemented. 
 It may be viewed as a digital implementation of the analyzer portion of a 
 spectrum channel vocoder. The speech input is parameterized into a 
 number of coefficients representing the spectral envelope and one 
 parameter representing the fundamental frequency. 
 
 It has been shown that, by appropriately varying the input sampling 
 rate, a single tranversal filter may be used to obtain the required 
 coefficients. Thus, the idea of Harmonic Self-sampling was introduced. 
 The flexibility of the Burst implementation is demonstrated by the fact 
 that essentially the same hardware can be used to generate other 
 orthogonal transforms. Thus, Hadamard, Chebyshev, and Karhunen-Loeve 
 transforms are also possible. 
 
 We may conclude that in the area of speech processing, Burst 
 representation does indeed provide a promising alternative to conventional 
 binary systems. The analyzer described is an important first step in 
 demonstrating this applicability. 
 
45 
 
 REFERENCES 
 
 Campanella, S. and Robinson, G., "A Comparison of Orthogonal Trans- 
 formations for Digital Signal Processing," IEEE Transactions 
 on Communication Technology , COM-19 , December 1971. 
 
 Robinson, G. and Granger, R., "Fast Fourier Transform Speech 
 Compression," Proceedings of the 1970 IEEE International 
 Conference on Communications, paper 26-5, June 1970. 
 
 Poppelbaum, W. J., Appendix to "A Practicability Program in 
 Stochastic Processing," Department of Computer Science, 
 University of Illinois, March 1974. 
 
 Bracha, E., "Burstcalc (A Burst Calculator)," Report UIUCDCS-R-75-769, 
 Department of Computer Science, University of Illinois, 
 October 1975. 
 
 Taylor, G. , "An Analysis of Burst Encoding Methods," Report 
 
 UIUCDCS-R-75-770, Department of Computer Science, University 
 of Illinois, December 1975. 
 
 Mohan, P., "The Application of Burst Processing to Digital FM 
 
 Receivers," Report UIUCDCS-R- 76-780, Department of Computer 
 Science, University of Illinois, January 1976. 
 
 Pleva, R., "A Microprocessor-Controlled Interface for Burst Processing," 
 Report UIUCDCS-R- 76-812, Department of Computer Science, 
 University of Illinois, July 1976. 
 
 Wolff, M. , "Transmission of Analog Signals Using Burst Techniques," 
 Report UIUCDCS-R-77-838, Department of Computer Science, 
 University of Illinois, January 1977. 
 
 Cohen, J., "An Automatic Process for Pitch Period Extraction of 
 
 Speech," Masters Thesis, Department of Electrical Engineering, 
 University of Illinois, 1970. 
 
 Noll, A., "Short Time Spectrum and Cepstrum Techniques for Vocal 
 Pitch Detection," Journal Acoustical Society of America , 
 Vol. 36 , 1964. 
 
 Flanagan, J., "Speech Analysis Synthesis and Perception," Academic 
 Press, Inc., New York, 1965. 
 
 Bergland, G., "Fast Fourier Transform Hardware Implementations-- 
 
 An Overview," IEEE Transactions on Audio and Electroacoustics , 
 Vol. AU-17 , June 1969. 
 
 Liu, J., "Application of Burst Code and Burst Processing Techniques 
 in Communication Systems," Internal Memo, May 1977. 
 
 Tolstov, G., "Fourier Series," Prentice-Hall, Inc., New Jersey, 1962. 
 
46 
 
 [15] Schroeder, M. , "Vocoders: Analysis and Synthesis of Speech," 
 Proceedings of the IEEE , Vol. 54 , No. 5, May 1966. 
 
 [16] Poppelbaum, W. J., "Statistical Processors," Advances in Computers , 
 Vol. 14 , 1976. 
 
 [17] Poppelbaum, W. J., "Application of Stochastic and Burst Processing 
 to Communication and Computing Systems," Department of 
 Computer Science, University of Illinois, March 1976. 
 
47 
 
 APPENDIX 
 CIRCUIT DRAWINGS 
 
48 
 
 CM 
 
 <-WAr 
 
 o 
 
 m 
 u> 
 
 o. 
 
 2 
 
 i 
 
 z 
 a 
 
 o 
 o 
 
 IO 
 
 ■v j VW-> 
 
 <-vw— > 
 
 
 «-WH>— - 
 
 4. 
 eg 
 
 UJ QC 
 
 T UJ 
 
 X 
 
 O — 
 
 CD 
 
 
 E 
 O 
 
 +-> 
 
 o 
 
 <1J 
 
 4-> 
 CD 
 Q 
 
 •o 
 o 
 
 •p— 
 
 s- 
 a> 
 a. 
 
 o 
 
 +-> 
 
 s- 
 
 Z3 
 CD 
 
 3 
 
 0- 
 
 T 
 
49 
 
 2 S 
 
 w m 
 
 <o 10 
 
 tn v> 
 
 a. a. 
 
 a x 
 
 a. z 
 
 z a. 
 
 0. Z 
 
 ^WV — I 
 
 o 
 
 S- 
 E 
 
 o 
 
 o 
 u 
 
 •r- 
 +■> 
 
 fO 
 
 E 
 O 
 
 4J 
 
 C\J 
 
 a; 
 
 s- 
 
 3 
 CD 
 
 UJ 
 
 0- 
 
 o 
 
 K 
 O 
 
50 
 
 o 
 o 
 
51 
 
 c 
 o 
 
 Q 
 
 o 
 
 o 
 
 o 
 o 
 
 «3- 
 
 <c 
 
 a; 
 
 s- 
 
 CD 
 
52 
 
 lY 
 
 o 10 
 
 a 
 
 O IO 
 
 ( 
 a -#. 
 
 IT 
 
 F Tl It TT II HIT It IT It IT It 
 
 A 
 
 1 01 
 
 4 I 
 
 O IO 
 
 a ♦ 
 
 M 
 
 it n hit n fit run Tin 
 
 OK o'>Coxo'*KO>* D* >? O X s'tfd'kU^BK OX 
 
 in o 
 
 I 
 
 2 
 
 J- 
 
 0) 
 
 Q. 
 •r- 
 -t-> 
 
 3 
 
 00 
 
 a. 
 
 ID 
 
 < 
 
 s- 
 
 3 
 
 o z 
 
53 
 
 J^ 
 
 "3T 
 
 «• <T 
 
 
 
 
 ♦ x 
 
 
 
 T 
 
 
 
 
 
 
 
 
 10 
 
 o 
 +-> 
 
 o 
 o 
 
 E 
 
 u 
 
 •I" 
 
 4- 
 
 O 
 
 
 <U 
 
 CD 
 
 
 
 T 
 
 o io 
 o -e- 
 
54 
 
 4 
 
 JL 
 
 in 
 
 •*-e- 
 
 
 nr 
 
 +u 
 
 o 
 
 It 
 
 UJ to 
 
 < — w-. 
 
 V 
 
 Hi' 
 
 *i 
 
 JL 
 
 TU 
 
 < OD 
 
 H" 
 
 IH 
 
 nr 
 
 pr 
 
 a 
 
 in 
 
 ->— 
 
 ->~x 
 
 -^>o-» UJ 
 -"^>o-» o 
 -^>o-» o 
 
 -D>~< 
 
 c 
 o 
 
 +-> 
 
 4-> 
 
 o 
 o 
 
 +-> 
 
 c 
 0> 
 
 o 
 
 < 
 
 0) 
 
 s- 
 
 CD 
 
 i 
 
 a io 
 a 
 
 t 
 
 ?T 
 
55 
 
 ro 
 
 Q. 
 to 
 
 4-> 
 
 o 
 
 CO 
 
 < 
 
 CD 
 
56 
 
 a; 
 
 Q. 
 
 +J 
 O 
 ■P 
 
 o 
 
 s- 
 
 s_ 
 
 N 
 (0 
 
 c 
 
 CTi 
 < 
 
 s- 
 
 CD 
 
SECURITY CLASSIFICATION OF THIS PAGE (TWian Dete Entered) 
 
 REPORT DOCUMENTATION PAGE 
 
 READ INSTRUCTIONS 
 BEFORE COMPLETING FORM 
 
 I REPORT NUMBER 
 
 UIUCDCS-R-77-870 
 
 2. GOVT ACCESSION NO 
 
 3. RECIPIENT'S CATALOG NUMBER 
 
 4 TITLE (end Subtitle) 
 
 APPLICATION OF BURST PROCESSING TO THE 
 SPECTRAL DECOMPOSITION OF SPEECH 
 
 5. TYPE OF REPORT & PERIOD COVERED 
 
 M.S. Thesis/June 1977 
 
 6. PERFORMING ORG. REPORT NUMBER 
 
 UIUCDCS-R-77-870 
 
 7. AUTHORf*; 
 
 Christ John Xydes 
 
 I. CONTRACT OR GRANT NUMBERfa) 
 
 N00014-75-C-0982 
 
 9 PERFORMING ORGANIZATION NAME AND ADDRESS 
 
 Department of Computer Science 
 
 University of Illinois at Urbana-Champaign 
 
 Urbana, Illinois 61801 
 
 10. PROGRAM ELEMENT. PROJECT, TASK 
 AREA & WORK UNIT NUMBERS 
 
 II. CONTROLLING OFFICE NAME AND ADDRESS 
 
 Office of Naval Research 
 
 Code 437 
 
 Arlington. Virginia 22217 
 
 12. REPORT DATE 
 
 13. NUMBER OF PAGES 
 
 14 MONITORING AGENCY NAME ft ADDPESSf 11 dlltereni /ram Controlling Ofllca) 
 
 IS. SECURITY CLASS, (of thla report) 
 
 Release Unlimited 
 
 15a. DECLASSIFICATION/ DOWN GRADING 
 SCHEDULE 
 
 16 DISTRIBUTION ST ATEMEN T (ol thla Report) 
 
 Distribution Unlimited 
 
 17. DISTRIBUTION STATEMENT (ol th, abstract entered In Block 30, It different from Report) 
 
 '8 SUPPLEMENTARY NOTES 
 
 '*■ KEY WORDS (Continue on revmrae tide It neceaaery end Identity by block number) 
 
 Burst Processing 
 
 Harmonic Self Sampling 
 
 Vocoder 
 
 Fourier Transform 
 
 Block Sum Register 
 Transversal Filter 
 
 20. ABSTRACT (Continue on reveraa elde It neceaaery and Identity by block number) 
 
 The application of Burst Processing to the Droblem of spectral 
 decomposition of speech is discussed. It is shown that such a 
 representation provides a viable alternative to conventional speech 
 analyzers. A specific Burst implementation is presented. 
 
 ^D I jan 73 1473 EDITION OF 1 NOV 6B IS OBSOLETE 
 
 S/N 0102-014- 6601 | 
 
 SECURITY CLASSIFICATION OF THIS PAGE (When Dete Entered) 
 
.IOGRAPHIC DATA 
 ET 
 
 I. Report No. 
 
 UTIimr.S-R-77-B7n 
 
 3. Recipient's Accession No. 
 
 |f .uM Siiln |(| 
 
 'PLICATION OF BURST PROCESSING TO THE SPECTRAL 
 [COMPOSITION OF SPEECH 
 
 5- Report Date 
 
 June 1977 
 
 it lions "l 
 
 CHRIST JOHN XYDES 
 
 8. Performing Organization Kept. 
 
 ° UIUCDCS-R-77-870 
 
 -rforming Organization Name and Address 
 
 )artment of Computer Science 
 
 iversity of Illinois at Urbana-Champaign 
 
 jana, IL 61801 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract /Grant No. 
 
 N00014-75-C-0982 
 
 ^nng Organization Name and Address 
 
 fice of Naval Research 
 
 ie 437 
 
 lington, VA 22217 
 
 13. Type of Report & Period 
 Covered 
 
 M.S. Thesis 
 
 14. 
 
 rm in try Note; 
 
 The application of Burst Processing to the problem of spectral 
 decomposition of speech is discussed. It is shown that such a representation 
 provides a viable alternative to conventional speech analyzers. A 
 specific Burst implementation is presented. 
 
 '. Words and Document Analysis. 17a. Descriptors 
 
 Burst Processing 
 Harmonic Self Sampling 
 Vocoders 
 
 Fourier Register 
 Transversel Filter 
 
 iiiitif iors Open-Ended Terms 
 
 I He Id /Group 
 
 
 "■ir> Matement 
 
 Release Unlimited 
 
 19. Security Class (This 
 Report) 
 
 UNCLASSIFIED 
 
 20. Security Class (This 
 Page 
 UNCLASSIFIED 
 
 21. No. of Page 
 
 22. Price 
 
 ''tis-bs 1 10-70) 
 
 USCOMM-DC 4Q325-P7I 
 
OCT 6 
 
aug inn