Extending the Bandwidth of NarrowBand Speech Using Cepstral Linear Prediction

By: S. Jaisimha
Singapore Design Engineering Centre
Delphi Delco Electronics Systems

I. Y. Soon
School of Electrical and Electronic Engineering
Nanyang Technological University


In this paper we propose a new post processing technique to recover the missing high band frequency components lost during low pass filtering in band-limited narrow band speech signals. The approach taken is based on linear prediction of cepstral coefficients from narrow band speech and an excitation function to estimate the high band spectral envelope of the original wideband speech signal. The method presented here improves both subjective and objective quality of bandlimited speech at the receiver side of a communication system.

1. Introduction

Currently, legacy PSTN and wireless networks compress wideband speech in the range of 300 - 3400 Hz to conserve on the frequency spectrum. The limitations arise from their transmission bandwidths. Compressed speech in the 300 - 3400 Hz is sufficient for intelligibility but compromises on subjective speech quality. Voiced speech, for example includes fricatives, contains energy in the 4000 - 8000 Hz high frequency sub-bands. Reproduction of the high sub-bands will thus add more to the naturalness of the speech signal.

Although Nyquist theory sets the fundamental limits of a speech signal as according to its sampling rate, extending the high band spectral envelope is possible to an extent through the exploitation of spectral redundancies of speech in the lower sub bands of narrow band speech.

Various methods have been proposed for speech enhancement. Avendano, et al [1] proposed the use of an LPC all-pass filter model and high band envelope prediction using time trajectories through a predictor designed on training data. Julian Epps, et al [3] has proposed a method using codebook mapping to assist in highband spectral envelope resynthesis.

Hiroshi [2] utilized spectral folding and shaping filters to achieve high band spectral recovery. More recently, I.Y Soon et al in [7] proposed a means of improving the spectral folding technique in [2] by using a zero crossing threshold to differentiate between voiced and unvoiced segments and appropriately adjust the spectral gain between the segments. Our contribution focuses on a post processing strategy, typically at the receiver side of the speech communication system. The technique we propose uses cepstral linear prediction, to recover the 3400 - 8000 Hz speech bands by exploiting the correlations in the lower bands of 300 - 3400 Hz.

We achieve this by mapping cepstral coefficients for narrow band speech to 16 kHz domain by means of a linear predictor that has been trained over generic speech files. We generate excitation signal for 16 kHz through upsampling by 2 the narrowband speech and passing the filter through a full-wave rectifier circuit. The simulations we have carried out attest that cepstral linear prediction shows low spectral distortion as compared to general non-linear processing methods in [2]. An objective comparison between cepstral based enhanced speech and the method proposed by I.Y. Soon et al in [7] is presented here.

Previous Page | Next Page
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8

If you found this page useful, bookmark and share it on: