Using DSP Technology to Optimize Speech Recognition Performance
In the Real World
If we were to consider an in car telematics or navigation system that requires voice instructions, it may be represented by the diagram below.

The next fiqure shows a block diagram of the speech recognition aspects of this system used together with noise and echo reduction technology.

The voice of the driver is picked up by the communications microphone and is first processed by the block labelled RNF (Referenced Noise Filter), a type of echo canceller. This block has a direct feed from the music system, so that this background noise can be reduced. If the vehicle is an emergency services vehicle, this feed might well come from the siren.
The RNF block may also have a feed from the actual Automatic Speech Recognition (ASR) module itself. This is so that if the ASR system is talking to the driver, the driver can speak over the automated voice and still be understood. The RNF technology ensures that the ASR hears the driver's voice, but not the sound of its own voice coming out of the loudspeakers. This is an important aspect of interactive speech recognition systems, the ability to choose an item from a menu, without waiting until the system as listed all possibilities. This is called "barge-in".
Once the 'echo' has been removed, the signal is processed by the VRE, (Voice Recognition Enhancer) block. This technology can provide in the region of 6-18dB of noise reduction with minimal damage to the speech element of the signal. From here the speech is fed into the ASR module.
Because the VRE technology will allow voice type signals to pass through, it is possible that the voices of passengers, or maybe even music from their portable sound systems, might still corrupt the quality of the speech entering the ASR module. In some cases, a second microphone can be used to pick up the unwanted sound, so that the ENR (Enhanced Noise Reduction) block can eliminate this noise from the system, ensuring as clear speech as possible enters the ASR module. Such dual microphone noise reduction technology has particular applicability in cellular phone and headset applications where the communications microphone picks up significantly more speech than the second microphone, while the noise at both is fairly correlated.
If you found this page useful, bookmark and share it on:
If you are familiar with RSS feeds, you can also sign up for our free blog feed. Our RSS feed is updated in real-time while our newsletter is updated daily.
