Extending the Bandwidth of NarrowBand Speech Using Cepstral Linear Prediction
2. System Description
The block diagram of our proposal is shown below.

Diagram 1: Proposed Decoding Scheme
The input speech signal s(n), bandlimited to the 300 - 3400 Hz range, first gets transformed into the cepstral domain. Voiced speech as we know can be modeled as a linear process involving the multiplication of the vocal tract frequency response with the excitation's response.
Cepstrum is a homomorphic technique based on the log scale, which de-correlates this relationship. After cepstral transformation, the lower value cepstral coefficients will model the vocal tract spectral dynamics while the higher valued coefficients will contain the pitch information.
In our setup we take the first 8 cepstral coefficients to contain the necessary vocal tract information for subsequent speech reproduction during the enhancement stage. Equation (1) shows this property:
c(n) = F-1{ log | V(ejw) |} + F-1{ log | P(ejw)| }..........(1)
where,
c(n) represents the cepstral coefficients,
V(ejw) represents the vocal tract,
P(ejw) represents the pitch excitation and
F-1 represents inverse Fourier Transform
Our system takes consecutive speech frame of 256 samples, applying hanning window tapering with 50% overlap to prevent spectral leakage before doing cepstral conversion. Since cepstral conversion involves conversion into the frequency domain, we utilize the use of FFT during the cepstral conversion stage.
Previous Page | Next Page
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
If you found this page useful, bookmark and share it on:
If you are familiar with RSS feeds, you can also sign up for our free blog feed. Our RSS feed is updated in real-time while our newsletter is updated daily.
