The Acoustic Property of Vowels and Consonants
CSD 349: Speech and Hearing Science

Module Objectives

spectogram.png 1. Describe movement patterns of the articulators and the relationship between movement and velocity.

2. Describe the coding system for vowels and consonants.

3. Describe how the acoustic properties of the vocal tract are determined.

4. Describe how the vocal tract shapes the input signal.

5. Describe digital techniques for speech analysis.

6. Segment and interpret a spectogram.

Module objectives relate directly to these course objectives:

Course objective 3: Describe the acoustic and physiologic characteristics of sounds.

Course objective 4: Analyze speech and voice by using instrumental and non-instrumental assessments and interpret the measures obtained.

The following will help you meet the objectives of this module:

1. Take the pretest in the quiz tool to assess what you already know about the acoustic characteristics of consonants and vowels.

2. Read the chapter on the acoustics of vowels and consonants in your textbook (pages 98-143), and about spectograms (pages 276-285).

3. Complete the activities in the course module to give you more practice with the material.

4. Complete the worksheet on the acoustic characteristics of vowels and consonants on page 3 of this module and return it to me either through ReggieNet mail, to my mail box in room 204 Fairchild, or put it under my door (room 310 Fairchild).

5. Complete the lab project in the assignments tool.

6. Take the post-test in the quiz tool to assess your knowledge of the acoustic characteristics of consonants and vowels.

Articulatory movements and velocities

Production of speech, of course, is influenced by the movement of the articulators. One part of speech science involves measuring the movement of the articulators. As far as speed is concerned, we can rank the articulators from fastest to slowest. Here is an activity to order some of the articulators from fastest to slowest.  Hyperlink to Ordering Activity The reference point for measuring which articulator is faster or slower is the maxilla. We could have chosen the mandible, but by choosing the maxilla, we have chosen a point closer to the pivot point of the oral cavity, which is the condyle. 

As far as movements go, there will be larger displacements more peripherally.  Let's take the example of a group of skaters linking arms, going in a circle with one as the pivot—the last one is going to be going very quickly, whereas the pivot is not really moving much at all. In the drawing below, the movement of c, b, and a will be very different depending on whether the reference is the mandible or maxilla.     The tongue (dorsum [point b in the picture], tip [point c]), and lower lip are faster relative to the condyle.tongue.jpgmaxilla, because they get an assist from the mandible.  If the reference point were the mandible, the tongue could be almost stationary simply because when the mandible moves up, so does the tongue.


With articulation, we often think of spatial targets (for example, the tongue tip position for /s/).  But targets can also be acoustic, perceptual, or force (such as air pressure).  Some sounds have to reach a certain force, for example.  An example of a perceptual target would be a sound that "sounds right."

 You do not have to be too precise with the movement of your articulators to be understood, especially in the case of vowels. The articulators may undershoot their target. Undershooting usually works in the direction of the vowel.  You do have to be more particular in achieving consonant targets.  Remember we said that consonant articulation requires more fine adjustment and movement of the articulators.  We can lop off the second element of a diphthong and still be intelligible.  Say "city by the bay" slowly and quickly three times to see this. With the faster speed, you're not really saying that second sound in the vowel in "bay," yet you are intelligible.

 There's a strong positive relationship between velocity and displacement within and between speakers.  The farther an articulator is moved, the faster it will be moved.  So, that means the greater the distance, the greater the velocity.  Say the nonsense syllables /iti/ and /ata/. In these syllables, the displacement from /a/ to /t/ is greater than /i/ to /t/--velocity is adjusted accordingly for a smooth transition.  There is a close positive correlation between velocity and displacement.  An articulator will move faster if it's got a longer distance to travel.  But of course, if you think of accuracy, the farther the target, the less accurate—so you do get a trade off with speed and accuracy.


There is a size factor operating in regard to movement, too.  If you have a larger person who has to move the articulators a long distance, he will move them faster than a small person would.


Time units seem to be an important factor even early in life—even in babbling.  Babbling sounds like real speech because the time transitions between sounds stay fairly constant.  We can change the velocity of the articulators. Velocity can be varied, quietfloorslow.pngand does vary developmentally. Children have slower speech movements and slower articulation rates than adults do, as well as slower velocities of articulators. And elderly adults also have slower articulation rates and slower velocity rates, too. Transition times between sounds stay fairly constant thoughout the lifespan, unless speech is disordered (for example, as in Parkinson's disease).

When you increase your speaking rate, you don't necessarily increase the velocity of movement. The waveform on the left is of the phrase "She was having trouble finding the quiet floor on the library."
You can see that the utterance takes approximately 6 seconds to say, and since there are 17 syllables, the articulation rate is approximately 2.8 syllables per second.

The waveform under the slow speech is the same phrase spoken at a faster rate. The utterance takes approximately 3 seconds to say, so the articulation rate is approximately 5.67 syllables per second.


Look at the two waveforms in these pictures. The shape of the waves are similar between fast and slow speech, but the signal in the fast speech trace is compressed. One reason for compression is undershoot--the articulators aren't quite reaching the vowel targets. In the faster sample, vowel duration and pause duration are reduced. Consonants

quietfloorfast.png  do not get reduced as much.  The intelligibility of an utterance spoken at a fast rate of speech is dependent on perceptual factors, too. One factor is familiarity with the dialect or language. If a listener is familiar with the language, he or she can understand speech with undershoot.

As noted earlier, faster rates of speech don't necessarily mean the velocity of the articulators increases. The velocity of the articulators (not to be confused with articulation rate) can decrease, stay the same, or increase. If you want to say something at a louder volume at a fast rate, you probably do have to increase velocity, which takes more work.


 Another process we can use to avoid increasing the velocity of the articulators is assimilation. When you assimilate, you are changing the movement of a single articulator in anticipation of a neighboring sound.  To see what is meant by assimilation, say the word "eat" and feel where your tongue is for the last sound /t/. Now, say the word "the," and feel where your tongue is for the first sound (the "th") in the word. Now, let's put those words together. Say the sentence "Eat the cake." Where is your tongue at the juncture of the "t" in "eat" and the "th" of "the?" The /t/ has been assimilated to the place of the "th."

 There different types of assimilation processes. One type is called partial assimilation. One sound takes on the features of another sound, but the assimilation does not result in a totally new sound. The "t" and "th" in "Eat the cat" is an example of partial assimilation. the "t" in "eat" is still "t;" it has features of the "th." Another example of partial assimilation is the /k/ sound in "caught" and "key." Say those two words--you should feel that your tongue is further back in your mouth for "caught," because the vowel sound in that word is a low back vowel. When you say the word, you anticipate the low back vowel and the /k/ takes on the features of that vowel. But the two /k/ are still the same phoneme.

With complete assimilation, one phoneme changes into another. Say the phrase "ten cards" and feel what your mouth is doing when it says the sound represented by the letter "n" in "ten."  Because in "ten cards" the tongue is moving back in anticipation of the /k/ in cards, the result is a velar nasal /ŋ/. So the "n" is not the /n/ sound, but a completely different phoneme /ŋ/. We can see the same phenomonon in "think," "bank," and "anger." 

The assimilation we have seen changes a sound in anticipation of a sound that will follow. This is called anticipatory assimilation. Anticipatory assimilation is right to left; a sound is influenced by a following sound.  We have seen this in "ten cards" and "key" vs. "caught".  The articulation of the /n/ and /k/ is influenced by the following sound. 

 Carry over assimilation is left to right; and is the case of a sound is influenced by a preceding sound.  The plural in "cats" is voiceless because of the preceding voiceless consonant /t/, and the plural in "dogs" is voiced because of the preceding voiced consonant /g/.

 Assimilation has an effect on articulation in a three ways—place, voicing, and manner.  An example of place is in "ten cards" and "eat the cake." The sounds that are assimilated have changed their place of articulation. To test your knowledge about place of articulation, answer the following question:

 Show/hide comprehension question...


For voicing, we've seen the example of "dogs" and "cats"--the sounds that are assimilated have changed their voicing. Manner is how the sounds are articulated. Look at the sounds represented by the letters "d" and "u" in the word "educate." These were originally the stop plosive sound /d/ and the the glide--like the first sound in "you," written phonetically as /j/. When the sounds are said in the word, however, we see a process called palatalization; the sound becomes an affricate in "educate." This is due to assimilation of /d/ to the palatal place of the articulation of the /j/.


Co-articulation is another process which helps us avoid increasing the velocity of the articulators for speech. For co-articulation, a feature of one phone is used or carries over during the production of another phone or phones.  In essence, two articulators are moving at the same time for different phonemes.  This is different from assimilation, where one articulator is modifying its movements because of context. For the word "boom," for example, the tongue is in position for the /u/ during the production of the /b/, and the lips are rounded for /b/.  Say "boom a couple of times to feel what your tongue and lips are doing, and then say "beam" a couple of times--there is no rounding there, and the production of the /b/ is quite different. While you're saying "beam," though, the velum is also co-articulating, as it is is elevated for /b/.  The velum must be fully lowered to produce the /m/.  The vowel /i/ in "beam" takes on the features of the /m/, as it is slightly nasal.  The velum lowers after the /b/ is produced, and continues its lowering through the vowel and completes it during /m/.  These are good examples of co-articulation, which makes it possible to speak quickly and effectively. Coarticulation is possible because the articulators are able to move somewhat independently of each other.


As with assimilation, there are different types of coarticulation, too.  Anticipatory co-articulation is preparation for the next sound.  One example is with the word "construe." Say the word and notice where your lips are. With "construe," the lips must be protruded for the /u/, and they go to that position as soon as they can.  Immediately after the the first vowel, they start protruding.  In the word "freon," the velum starts to lower after the /r/, in preparation of the nasal /n/ at the end. Anticipatory co-articulation Involves neural pre-programming or else the words could not be properly articulated.


Another type of co-articulation is carryover, which means that the posturing for one sound carries over to the next.  For the word "stoop," the lips stay protruded through the /p/ segment.  And for "steep," they stay retracted during the /p/ segment. 


There are some constraints for co-articulation.  The constriction for fricatives can't be too small or too large—size here is critical.  If the constriction is too large, you lose friction, and won't be able to produce the sound. To see this, say the words "cease" and "Seuss." In both words, the tongue position is very similar, and the lips are able to co-articulate for the vowels. With the /i/ in "cease," they are retracted. With the /u/ in "Seuss," they are rounded. Because of the fricative /s/, the tongue tip will be very much constrained. The tongue tip can go up or down (most people put theirs up), but the constriction has to be the same.


Speech development and articulation

There are many stages of development for speech as a child grows, and there are also age differences in speech production.  As one example of development of the velum, Thompson and Hixon (1979) measured nasal airflow in 3-year-old children and adults as they said the nonsense syllable /ini/.  The 3-year olds had some spread of nasal airflow to the adjacent vowels, but the adults had more spread.  So as the child develops, he or she has more spread of features during co-articulation.  


It may be that in the early months of life, say 0 to 8 months of age, the goal for speech is more about global production, rather than assimilation or coarticuation. At this age, the child lacks fine motor skills and motor movements can be seen as "clumsy." Between the ages of 8 months all the way up to 12 years, the child is developing finer motor skills.  A lot more of this development occurs more in the early years that later (up to around the age of 5).  Most refinement occurs fairly early in the child's life.  As children mature, they begin to differentiate sounds in their production. Generally, their production is more variable than adults, and velocity and movement of the stuctures are slower.

 At age 12 and beyond, speech movements become "overlearned."  The child gains greater efficiency and uses more parallel processing—coarticulation may spread to surrounding sounds as we get older.  As we know, the brain has less plasticity later in life.  After the age of 12 or so, movement patters are very difficult to change because the motor pattern is ingrained.  We also know that if you want to learn a second language and speak it without an accent, it is best to learn that language before the age of 12. Plasticity is also why we have early intervention—the philosophy of early and aggressive treatment. The child's articulation movements and speech continue to develop and do not beome adult-like until the age of 14 or 16, so it is quite a long-developing process.


Looking at Sound through a Spectogram

Picture1.jpg Sound is a three-dimensional event: amplitude, frequency, and time. We can portray a sound acoustically in one of three ways. The first is through a waveform, which plots amplitude on the y- axis and time on the x-axis. The second is through a spectrum, which plots amplitude on the x- axis, and frequency on the y-axis. The third way is through a spectogram, which plots all three dimensions. As can be seen in the picture here, which is from page 107 in your textbook, the frequency of the sound is plotted on the y-axis, from bottom to top. The lower frequencies are at the bottom, and the higher frequencies are at the top. The bottom line is 0 Hz. Each horizontal line in the picture on the left may be about 1,000 Hz, and it is common to show up to 8,000 Hz. Time is plotted on the x-axis, from left to right. And amplitude or intensity is represented by shading--the stronger the signal, the darker it is. In the picture on the left, the darker bars are enhanced energy within the formant regions. If we were to draw a vertical line in the middle of each of the dark bars for each of the vowels, we would have the center frequency of the formant. As we get higher in frequency and go up the spectogram, the bandwidth gets wider and the formants are not as distinct. They run out of energy.

 Show/hide comprehension question...



The vertical pulses you see here are related to the opening and closing of the vocal folds. The lines are the sounds emitted between the focal folds. The distance between the lines for adult males represent approximately 1/100th of a second, and for adult females, 1/200th of a second. So, for a female, the vertical striations will be closer. The spectogram on page 107 (depicted here) is of a male voice--there are fairly large spaces between the vertical striations. Check your knowledge of spectograms so far. Look at this image of two spectograms and determine which is the male voice and which is the female voice. Then answer the following question:

 Show/hide comprehension question...


Wide band and narrow band spectograms

wideband.narrowband.jpg Spectograms come in two "flavors:" wide band and narrow band. You can see an example here on the left, which is the same image in your textbook on page 279, figure 13.4. Prominent features of narrow band spectograms are horizontal bands, which represent harmonics. The bandwidth needed to generate narrow band spectograms is between 30-50 Hz. These narrow band spectrograms are usually used to measure fundamental fequency and intonation, and to count harmonics. As you know from your phonation lab, we now have other instruments to measure fundamental frequency. The wide band spectograph is of interest to us, because it does show the horizontal bands of energy, or formants. The bandwidth used to generate wide band spectograms is between 300 and 500 Hz. This filter is broad enough to NOT resolve the energy into individual harmonics. Remember that the formants for one particular speaker will be essentially the same no matter what the frequencies or harmonics are. Another advantage of wide band spectograms is that they respond quickly to the onset and termination of a sound, so each glottal pulse is represented separately. Because the wideband spectogram measures the glottal pulses, it can make accurate temporal measurements of acoustic events. If you want to read a little more about these types of spectograms, click here.



















The acoustic properties of vowels


english-vowels.jpg When a vowel is produced, the vocal tract is relatively unconstricted, meaning vowels can be sustained. What is important in producing a particular vowel is the shape of the vocal tract, and with the vowels, the tongue is the ariculator. The two major acoustic cues that differentiate each vowel are duration and formants. You may recall from the Resonance module that the first formant, F1, is related to tongue height. F1 decreases as tongue height increases, so the higher the tongue's position in the mouth, the lower the F1. F2 is related to the front/back movement of the tongue. F2, then, changes according to the anterior/posterior movement of the tongue. The further back the tongue, the lower the F2. Of course, we will see a third formant for the vowels, but F3 is a qualitative difference which relates to the way the vocal folds vibrate--how tightly they are adducted. F3 is also most responsive to front vs. back constriction. Only the first two formants are necessary for distinguishing the vowels. The third formant adds naturalness, and makes the vowel sound like it is not synthesized. And it is the patterns of the formants, rather than the values of the frequencies, that is important to distinguish individual vowels. There's a great illustration of the way the first two formants characterize all the vowels if you look on page 108 in your textbook, figure 5.27. You can see that the first and second formants for some vowels areiauformants.png very close together, which is why programs like Praat draw formant lines for vowels, as you can see here for the vowel /a/.

 On the right is a visual of the production of /i/, /a/, and /u/, their formant frequencies, and a graph showing you the relationships of the formants for these vowels. This picture can also be found on pages 108 in your textbook.


The picture on the left above is the vowel quadrilateral. You can see a similar picture in your textbook on page 105, figure 5.22. If you are unfamiliar with the symbols, your texbook, Appendix A, page 306, has a list of symbols and words that illustrate the sound. The "front-back" distinction is a characteristic of tongue retraction, and is responsive to F2. The "high-low" distinction is a characteristic of tongue height, and is responsive to F1. The frequencies of the formants for the different vowels spoken by men, women, and children are found in your textbook, page 101, Table 5.1. You can also explore how these vowels are produced by looking at this website from the University of Iowa. If you click on "English" and then the vowels, monophthongs, you can see and hear their production. Let's look at the vowel /i/ as an illustration.

/i/: the vowel sound in "beat"

In English, the vowel /i/ is a high front, unrounded vowel. The tongue tip is forward in the mouth when the vowel is produced. Go over the production of the vowel on pages 98-102 in your textbook. Because the tongue is elevated, it forms a constriction at a point of peak velocity. Do you remember from the resonance module if that was a node or antinode? Check here to find out:

 Show/hide comprehension question...

When you say /i/, what happens to the posterior part of the tongue?

 Show/hide comprehension question...

The configuration of the posterior part of the tongue creates a constriction at a point of maximum pressure, or at a node. The result is a large pharyngeal cavity, as the tongue is out of the way of the back of the pharynx. The tongue has movedvowel.quadrilateral.jpg up and forward, and the lips are spread. This creates a small oral cavity and a larger pharyngeal cavity. The constriction at the point of peak velocity has the effect of lowering F1. Looking at table 5.1, page 101, we find that F1 is the lowest of all the vowels for the sound /i/. The constriction at the point of maximum pressure tends to raise F2 and F3. As you know, a neutrally shaped male adult vocal tract would have resonant frequencies of 500, 1,500, and 2,500 Hz. We can see that when /i/ is produced, F1 = 270 Hz, F2= 2290 Hz, and F3 = 3010 Hz. Thus, F1 is lowered, and F2 and F3 are raised.


Differentiation of vowels: formants and duration

Figure 5.24 on page 106 in your textbook shows the vowel quadrilateral in terms of formant frequencies. The first formant frequency is correlated with the area at the back of the pharyngeal cavity, and tongue height. The second formant frequency is correlated with the length of the oral cavity. You can see ranges from 200 to 800Hz for the first formant frequency (the y-axis), and ranges from 600-2500 Hz for the second formant (the x-axis). This quadrilateral covers the formant frequencies given on page 101 for men, women, and children.

tonge.movement.5.28.jpg We've seen the picture on the left in the resonance unit (from page 107 of your textbook), but it is a good illustration of where the tongue is during the production of several vowels. With this illustration, you can track the lip, tongue, and back of tongue. There will be a lowering of F2 as the front cavity is enlarged because because of tongue retraction (and lip protrusion) with the production of the word "food" (number 7), for example.There will also be a general rising of F1 from /i/ to /a/ as the pahryngeal cavity size is decreased, due to tongue lowering, and lip opening is increased, due to jaw lowering. (The example here is the word "heed," number 1 and "father," number 5). There is also a lowering of F1 from /a/ to /u/ as the pharyngeal cavity size is increased, as the tongue rises, and lip opening is diminished, as the jaw rises. The example here is the word "father," number 5, and "food," number 7.


So you can see that formants distinguish vowels. There is another distinguishing characteristic that was mentioned earlier, and that is duration. Vowels with greater duration are called "tense," and those with less duration are called "lax." Here's a webpage that shows you which vowels are tense and which are lax. The tense vowels are in words such as "see," "say," "so," "sue," and "saw." And examples of lax vowels are in the words "sit," "set," "sat," and "soot." The lax vowels generally have less extreme tongue positions--they are more or less in the middle of the vowel quadrilateral. You can see them in the first vowel quadrilateral on this page. If you say "sit" and "seat," or "suit" and "soot," you might notice that the vowels in "sit" and "soot" appear to be a little shorter than the vowels in "seat" and "suit." With these tense and lax vowels, though, we have another clue to distinguish them, and that is tongue position. So tenseness and laxness in English is a redundant feature. Redundant features do help us get meaning, and make speaking and understanding easier. An example of redundancy can be seen in everyday expressions. If you've done something nice for someone, and you hear that person say "...very much," you are likely to say "You're welcome," even if you didn't hear the "Thank you," for example. In phonetics, there are several reduncancies. The voiceless stops /p/, /t/, and /k/ are always aspirated, so there are two features that distinguish them--aspiration and their being voiceless stops. In English, then, vowel length is a redundant feature. Formant frequency is more important in distinguishing one vowel from another. But this is not so in other languages, where duration serves to distinguish the vowels. Swedish has a rather complicated vowel system, as you can see here, where there are several short and long vowels. Japanese is another example. "Oniisan," said with a long vowel in the middle, is "brother." But "onisan," said with a short vowel, is "devil." You wouldn't want to get those vowels mixed up!



The acoustic characteristics of consonants 

Consonants have more diverse acoustic characteristics than vowels do, and more diverse acoustic cues for perception. With vowels, there are slow changes in the articulators, voicing is the only source of sound, and they are produced with aarticulators.png relatively open vocal tract. Production of consonants requires rapid changes in the artiulators and constriction in the vocal tract. Voicing is not the only sound source--there is coordination with voicing, with aspiration (bursts of air), and frication (noise produced when air goes through a constriction).

Consonants are often discussed with respect to 3 major characteristics, which are voicing, manner of articulation, and place of articulation. Voicing is related to vocal fold vibration. The place of articulation is the placement of the major articulator for the consonant. You can see the three places of articulation for the consonants in the picture on the right.

In regard to manner, we're looking at the shape of the vocal tract, which is of course more complicated. Manner includes how the consonants are produced--are they nasal, stops, semivowels, affricates, fricatives? As with vowels, consonants are produced with a sound source with the sounds passing through the vocal tract. The movement of the articulators change the resonance of the vocal tract. Moving the tongue, lips and jaw changes the shape of the vocal tract, and of course, changing the shape of the vocal tract changes the formant frequencies.

Formants and consonants

ga.png As you recall from reading about the vowels, the first formant (F1) is affected by the size of the vocal tract constriction. For consonants, the size of the constriction gives a clue to manner. It is not related to the place of articulation. It is the second and third formants (F2 and F3) that are affected by the place of articulation. The picture on the left is a spectogram of the nonsense syllable /aga/. The first formant has been outlined in green, and the second and third, in red and blue. Notice that the first formant for /g/ is much higher than F1 for /a/; this is due to the fact that the vocal tract is more constricted for /g/ than for /a/.

What is also seen in this picture are the formant transitions--these are the curved part of the formants, and are rapid changes in frequency for a formant for a vowel immediately before or after a consonant. You can see another picture of a formant transition here. The F2 transition in particular is a very important acoustic cue to the place of articulation for a consonant. formant.transitions.gifThe picture below is an illustration. It shows the F1 and F2 formant transitions for the stop consonants /b/, /d/, and //g/. Let's look at the first row, the bilabials. The F2 locus for bilabials is relatively low; under 600 Hz. For the /b/, all F2 transitions are rising. Now, look at the row for the stop consonant /d/. The F2 frequency for this consonant is around 1800 Hz. The third example in that row has a level formant transion, and it is the only one in the picture . Notice that the last example in that row, /du/ has an F2 transition with a sharp fall. We can see the the F2 locus for alveolars is higher than that for bilabials. Finally, for velars, the F2 locus is high--around 3,00 Hz. For the velars, all the F2 transitions are falling.

Resonance applies to consonants, too. We'll be looking at resonance, which can be consdered a pole, or natural frequency (the lowest resonant frequency). Anti-resonance also applies to consonants, and is a frequency-selective suppression of sound energy, determined by a center frequency and a bandwidth. We'll look at anti-resonance when we talk about nasal consonants.

Turn to the next page to start examining specific consonants.



Stop consonants (e.g., /b/, /p, /d/, /t/, /k/, /g/

Stops are transient, rather than sustained sounds. Their production requires a complete closure of the vocal tract. The flow of air through the oral cavity is blocked. So there is a silent interval followed by an abrupt onset of sound. Stops may be voiced or unvoiced. Try the sorting activity to review voicing and stops.

Hyperlink to Sorting Activity


Voiceless stops are characterized by a noisy onset called aspiration. The noise varies in frequency, and there is no aspiration for voiced stops because the vocal folds adduct during their production. Durng voiceless stops, the vocal folds are abducted. During voiced stops, only a voice bar can be produced during the closure. You can see a voice bar here. With voiced consonants, voicing starts less than about 30 milliseconds (mscec) after the release of the stop, and voicing CAN occur during the closure. With unvoiced consonants, voicing starts more than about 50 msec after the release of the stop, and voicing CANNOT occur during the closure. In the first picture below on the left, you can see the bust after the stop. Looking at the voice bar on the second picture, the right side, is this a voiced or voiceless stop? Click here for the answer. The duration of time between the release (the burst) and the onset of voicing is called voice onset time. Figure 6.13 in your textbook, page 132, gives a good view of the aspiration in voiceless stops, and the lack of voicing as they are produced. You can also see a view of voice onset time here,of the words "die" and "tie." Voice onset time (VOT) for a voiced stop is 0 (0-30 msec), and for a voiceless stop, approximately 70 (50-80 msec).











Fricatives (e.g., /s/, /z/, /f/, /v/)

Fricatives are created by forcing air through a small constriction. As air flows through the constriction, a noise is produced due to turbulence. Fricatives are also continuants--their sound can be maintained. Like stops, fricatives are both voiced and voiceless. If you think about the production of /s/, the turbulence is produced at the front of the constriction--the tongue forms a groove, and the turbulence is at the groove, and that produces the sound. With the voiced counterpart, /z/, you have the same turbulence, but also voicing. You can see a voice bar when the voiced fricative is produced. The voicing reduces the amplitude of the fricative energy. Look at the waveform and spectogram of the words "see" and "zee" in the picture below to see those differences. "See" is the first word, followed by "zee."



The turbulence in /z/ acts as an antiresonator to supress some of the voiced energy. In general, voiced sounds include a low frequency energy corresponding to vocal fold vibration. Voiceless sounds lack that low frequency energy. With /s/ and /z/, we have a contrast in voicing.
















Let's think about the resonanting chamber when fricatives such as /s/ and /z/ are produced. The place of articulation for the fricatives can be seen in your textbook, page 125, figure 6.7. With these two sounds, the tube is very short, just from thes.resonator.png alveolar ridge to the lips. The resonant fequency will be fairly high because the tube is short. The tube is also open at both ends. Because the tube is open at both ends, we're getting a 1/2 wave resonator, which means the wavelength is twice the length of the tube. You can see a model of this in the picture on the right. In this configuration, the smallest part of a wave that will fit in both ends is 1/2, with an antinode, or point of maximum velocity, on each side. To determine the resonant frequency of a fricative, we have to modify the formula we used in the previous module. As you recall, the formula was



Since we now have a 1/2 length, rather than a 1/4 length resonator, the formula becomes F=(2n-1)c


The length of the tube is 2.5 cm, and use 34,000cm/second as the velocity of sound.

So it becomes 34,000/(2.5 X2) = 34,000/5 = 6,800 Hz. This is the pole, or lowest resonant frequency

Resonance and anti-resonance (poles and zeroes)

poles.zeroes.png Fricatives are a little more complex than stops, as the pharyngeal/oral cavity behind the constriction creates an antiresonance, or zero. The antiresonance suppresses some of the energy of the sound. A resonance, also called a pole, is the natural fequency of the consonant. All consonants have resonance, which is the frequency selective suppression of sound energy. Resonance, as you recall, is determined by a center frequency and bandwidth. Fricatives, along with nasals, have antiresonances, also called zeroes. The source of the antiresonance for fricatives, as stated earlier, is the pharyngeal/oral cavity and the significant constriction produced in the oral cavity. With most consonants, there is a constriction, and sometimes a complete closure. The smaller the constriction, the closer will be the pole-zero pair.The source for the antiresonance for nasals is the side branch produced by the oral cavity as air flows through the nasal cavity. In both cases, poles and zeroes usually pair with each other. A picture of the resonance/antiresonance curve is to the right.


Fricatives can be synthesized with one pole and a zero one octave below the pole. We usually do not care about the higher poles, because there is not enough energy. So, if we have 1 pole cnetered at 6,000 Hz, the zero is around 3,000 Hz. With fricatives, there is a second pole, and it is usually 2 times the first, so in this example, the 2nd pole is 12,000 Hz.


More characteristics of fricatives

Fricatives produced toward the lips have a higher center frequency (pole) than those produced further back. Fricatives with lower source energy will be those with obstructions close to the constriction. In the case of /f/, /v/, and the voice d and voiceless "th" sounds, the obstructions are the teeth. So you can see in the picture below, found on page 127 of your text, that the sound energy is low for those sounds, and relatively high for /s/ and /z/, and moderately high for the first sounds in "shoe," and the second consonant in "measure."
























Nasals (e.g., /m/ and /n/)

nasal.airflow.png Nasals are similar in production to voiced stops, except air flows through the nasal cavity. The velum controls the airflow through that cavity, and the airflow allows voicing to continue during the closure for the production of these consonants. The sound produced when their is closure in the oral cavity and air flowing through the nasal cavity has been called a nasal murmur. The picture at the left shows an open nasal cavity and the production of which nasal?

 Show/hide comprehension question...

As noted earlier, with the nasal consonants, the oral cavity becomes a side branch. The main branch, or resonating chamber, is the pharynx and nasal cavity. In terms of resonance, this tube is longer. In the standard vocal tract, we had a first formant frequency of 500 Hz. But we know that the longer the tube, the lower the formant. So the first formant frequency for nasals is about 250 Hz (for adult males). The reason for a low frequency formant is that the tube has become longer--it goes from the glottis to the pharynx to the nostrils.

Because the nasals are voiced and are continuants, on the spectogram they can look similar to vowels. They are generally less intense, though, due to the antiresonances in their production. The picture below contrasts a consonant-vowel-consonant with a vowel-nasal-vowel. You can see the formant bands for the nasal in that second picture. You've seen the first spectogram--what type of consonant is depicted here?

 Show/hide comprehension question...


Here is another view of the three nasal cononants in English, from your textbook, page 124, figure 6.6.

nasals.English.jpg You can see the voicing bar continues as the nasals are produced. You can also see the normal formants for the vowel /i/. Notice those formants fade as the nasals are produced. The nostrils do not radiate sound efficiently because of the fairly small apertures, and hair in the nostrils also attenuates the sound. Nasals creat antiresonances that attenuate higher formants relative to those of neighboring vowels.

As stated earlier, with the nasals, the oral cavity is the side branch, and the side branch introduces a pole-zero pair. These antiresonances are frequency regions in which amplitudes of the source components are severely attenuated. The shorter the side branch, the closer the frequency of the pole-zero pair. To check your understanding, answer the following question.

 Show/hide comprehension question...


If there's no side branch, the pole and zero will be superiposed and will cancel each other out, leaving the main tube. You can see from the picture on the right that the second formant is not as attenuated in the syllable "ing." The nasal /m/ has the longest side branch, which means the lowest F2 frequency. The highest F2 frequency is the nasal in the syllable "ing." Another way for the resonances and antiresonances to cancel each other out is if they are close enough in frequency. For nasals, the frequency of the antiresonance is in the region of 500 Hz. This frequency helps determine the identification of the nasal consonant. The nasal consonant has a fairly wide bandwidth as compared to oral cavity formants because the surface of the nasal concha has a lot of damping. The surface is a large, spongy area. If the nasal cavities are asymmetrical, which they usually are, each adds its own resonances. The paranasal sinuses also add resonances and anti-resonances.

Clinical importance

Understanding nasalization is important in three ways. The first pertains to coarticulation. As we have learned, there is nasal spread to surrounding vowels because the velopharyngeal port can neither open or close abruptly to produce nasals or non-nasal vowels. The "nasal spread" may be an important cue to perception of sounds. You also may know that in some languages, such as French and Hindi, nasalization is a phonemic difference--nasal and non-nasal vowels distinguish the meaning of words. And finally, those with craniofacial anomalies and disorders such as dysarthria have difficulties in speech production because of velopharyngeal insufficiency. Having an understanding of the acoustic properties of nasals will help speech-language pathologists with assessment and treatment of such disorders, as well as help them communicate to other professionals why the speech of clients with these disorders may sound "muffled" or soft (from Hixon, Weismer, and Hoit, 2008, Preclinical Speech Science, Plural Publishing.)


Affricates (e.g., the first sounds in "judge" and "chime")

 With affricates, there is a silent interval followed by more prolonged noisy onset. They do share features of both stops and fricatives. There is a complete blockage of air, a release, and then a fricative-like noise. Both the affricates in English are produced at the alveolar ridge. You can see a spectogram of these affricates in your textbook, page 135, figure 6.15. There is also one picture here of the nonsense syllable "cha" with its voiced affricate, which is marked. Notice the sound starts with a stop burst and follows by the noise of the fricative.



The approximants or semivowels (e.g., /l/, /w/, /r/)

The approximants (also called semivowels) have a more open construction than for fricatives; there is a relatively free flow of air, which produces no turbulance. All approximants are voiced, and the voicing continues throughout the production of the consonants. Approximants are simlar to vowels in that the vocal tract is relatively open. They have a lower F1 than vowels do, and they also tend to have more formant movement than vowels--you can see the transitions clearly, for example, in the picture below, and you can see how they change with different vowels. The frequency change in F2 and sometimes in F3 are important in the perception of the approximants. You can also see examples of the approximants/semivowels in your textbook, Figures 6.1 and 6.2, pages 117-118. As your textbook points out, the semivowels are produced with similar movements, and with similar acoustic results. The sounds /r/ and /l/ are often confused by speakers of Asian languages, and this is largely because many Asian languages do not have a phonemic contrast between these two sounds. Additionally, children have difficulty producing /r/ and /w/ is frequently substituted.



Summary of the consonants

We have reviewed the following concepts regarding consonants:

Here are two pictures of spectograms of different consonants, for your review.

consonants.1.png consonants2.png





Now that you have finished his module, I hope you have a better understanding of the production of the vowels and consonants, and the use of the spectogram.  As a review, after studying this unit, you should be able to:

1. Describe movement patterns of the articulators and the relationship between movement and velocity.

2. Describe the coding system for vowels and consonants.

3. Describe how the acoustic properties of the vocal tract are determined.

4. Describe how the vocal tract shapes the input signal.

5. Describe digital techniques for speech analysis.

6. Segment and interpret a spectogram.


Remember to complete the post test for this module.  The post test will serve as a study guide for the in-class exam which covers spectographic analysis, consonants and vowels.