Stuck with a SynthEdit project?

Category: Formant Synthesis

Formant Synthesis Part 2

Applying the basics of Formant Synthesis.

The basic idea is to take three bandpass filters connected in parallel. Here we use three SV filters set to their 2 Stage Bandpass mode, each one followed by a Level Adj module so we can adjust the three parameters we need:
1) Frequency
2) Resonance or “Q”
3) The audio level of each filter.
Each of these three filters represents a Formant.

By connecting white noise to the input and a frequency analyser to the output we can see the filter in operation- (by using white noise it’s easier to see the resonant peaks than it is with a pulse or sawtooth). You can see below how I have set some frequencies and the resonant peaks corresponding to those frequencies are clearly visible.

This method has some disadvantages though;
1) The stock SV filters are fixed into 1 Volt per Octave for the Pitch control meaning we have to do some maths and use extra modules to convert the readout into a Hz or kHz readout.
2) The stock SV filters tend to oscillate and ring at high resonance levels.
3) Three separate filters, and the maths and extra modules for displaying the filter frequency will all add to the CPU use.

A more Efficient Formant filter.

Fortunately there is a purpose built filter module in the TD modules range, the TD_SVX4.
It is a module that does the job of four SV filters, with the advantage that the Frequency control voltage is 1 Volt per kHz, making readout easier, and they are optimised so as not to “ring” or oscillate at high resonance levels.
This makes quite a neat and more CPU efficient solution for a Formant filter.

The structure of the dual TD_SVX_4 container is shown below. I used two filters to get a steeper bandpass response. You can see below the filter in operation with the four resonant peaks (again using a noise input). Even at maximum resonance there is no ringing or oscillation

Not just for speech.

By their nature acoustic instruments also have resonant frequencies, so we can use our formant filter to make more accurate imitations of acoustic instruments by adding resonances to their audio spectrum. The chart is shown below, note how some resonances are quite broad compared to others.

InstrumentF1F2
Flute800
Oboe14003000
English Horn9302300
Clarinet1500-17003700-4300
Bassoon440-5001220-1280
Trumpet1200- 14002500
Trombone600-800
Tuba200-400
French Horn400-500
Cello300500 + 900
Double bass70250
Viola220350
Violin5031600
Acoustic guitar90-150

More on acoustic instruments and formants.

Note: The Formant frequencies of acoustic stringed instruments are by no means set in stone. These vary from instrument to instrument, as there are a large number of variables affecting the sound, such as differences in construction, differences in materials etc. Different styles and models of acoustic guitar will have different resonances, but generally speaking the resonance will be in the lower frequency end of the scale as these are the weaker frequencies and need boosting to make the sound project evenly across the playing range.
Even “classical” instruments such as Violin and Viola will have different resonances due to the way they are constructed.

The best way to find the frequencies for these instruments is experiment, and find the sound that seems right to your ears, and experience of acoustic instruments.

Formant Synthesis Part 1

Formants.

What are Formants?
In the study of acoustics, speech and phonetics a formant is the part of the audio spectrum, which has a large peak in volume that results from the acoustic resonances formed by the human vocal tract, or by a room or hall.
By their nature when sound waves are created in a room some frequencies will be attenuated, and some will be boosted due to the shape and size of the room. In a room there are not only the direct sound waves that we hear, but also the reflections from the walls, and from furniture. If you want to find out more about room resonances there is an article on Wikipedia.
This also applies to human speech as well due to the mouth and vocal tract forming resonant cavities, only in this case when we create speech we are changing the dimensions of those cavities.

Formants in Acoustics.

In acoustics, a formant is usually defined as a broad peak, or local maximum, in level of the audible spectrum.
For harmonic sounds, the formant frequency is sometimes taken as the harmonic that is most strongly boosted by a natural resonance. A room can be said to have formants which characteristic of that particular room, due to its resonances, which are due to the size and shape of the room, this resonance will also be affected by the contents of the room, and any sound damping materials in the room. Room formants of this nature reinforce themselves by emphasizing specific frequencies and cancelling others. For the purposes of digital signal processing such as reverb, the way a collection of formants generated by a room or hall affects a signal can be represented by an impulse response.

Speech Formants.

In speech the formants are characteristic of the resonances of the the vocal tract which is formed by the following; The lungs, the Larynx, the vocal cords, the throat, the mouth, the palate, the shape and position of the tongue, the lips and teeth all affect the formants.
These formants make up all the sounds we recognize as vowels, and to a lesser degree also make up the sound of the consonants too.
The formant with the lowest frequency is called F1, the second F2, the third F3, and so forth. The fundamental frequency (or pitch)of the voice is sometimes referred to as F0, but it is not a formant. Normally the two first formants, F1 and F2, are sufficient for us to recognize the vowel.
In normal voiced speech, the vibration produced by the vocal cords resembles a sawtooth, rich in harmonic overtones.
If the fundamental frequency (or more often) one of the overtones is higher than one of the resonance frequencies of the system, then the resonances will be weak, and the formant usually produced by that resonance weak or almost completely inaudible. This is usually noticed in the singing of operatic sopranos, who sing at pitches high enough for their vowel sounds to become very hard to distinguish.

Note: Consonants are, to a large degree, noise bursts shaped by the tongue and lips, and we can model these using amplitude contours rather than spectral shapes, so these won’t concern us here.

Approximate frequencies found in vowel sounds. (Not a comprehensive list)

Vowel.Example.F1 Hz.F2 Hz.F3 Hz.
aLap66017002400
eeLeap27023003000
iLip40020002550
ooloop3008702250
“u”lug64012002400

This means that we can use these frequencies in a synthesizer to create roughly human sounding vowels, and “singing”.