Detection of usable speech using linear prediction coefficients
“Co-channel speech” refers to a speech input where a single speaker's speech is corrupted by an additional speaker's speech. If such a two-speaker speech can still be used as an input for a speaker identification system, it is termed as “usable speech”. The purpose of a usable speech detection system is to be able to determine such usable segments of speech. It would act as a front-end to a speaker identification system thereby improving the performance and accuracy of such a system.
Portions of usable speech occur when high energy voiced speech from the target speaker overlaps with low-energy speech from an interfering speaker, or visa versa. The current research proposes a novel technique to detect usable speech using Linear Prediction Coefficients. The width of the first peak of the vocal tract frequency response is used as a measure to detect usable speech. The Target-to-Interferer Ratio (TIR) is used as a benchmark for comparing the results.