Extending the Bandwidth of NarrowBand Speech Using Cepstral Linear Prediction
5. Enhanced Speech Signal
In cepstrum theory, cepstral coefficients c(n) are considered to be in the quefrency domain which is equivalent in nature to a time domain representation.
We achieve re-synthesis of the enhanced signal by multiplication of the excitation and the vocal tract response in the frequency domain rather than mere time domain convolution to avoid phase distortions that can arise due to linear time domain convolution of the respective impulse responses.
To do this we do an inverse cepstral transformation of our predicted cepstral coefficient set. The inverse process utilizes a 512-point FFT and now transforms the coefficients from the quefrency domain to the spectral domain. The excitation signal is similarly converted to the frequency domain through 512-point FFT before the multiplication with the inverse cepstral stage. Scaling of the excitation signal is done to ensure energy matching between the original signal and the synthesized version as noted in [4]. The use of a whitening filter is used to ensure that the excitation signal's spectrum is 'flat' before multiplying it with vocal tract spectrum to prevent distortions arising from non-uniform excitation spectrum.
Enhanced speech frame is achieved by simply doing an IFFT to the multiplication output.
Previous Page | Next Page
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
If you found this page useful, bookmark and share it on:
If you are familiar with RSS feeds, you can also sign up for our free blog feed. Our RSS feed is updated in real-time while our newsletter is updated daily.
