| Previous | [ 1] | [ 2] | [ 3] | [ 4] | [ 5] | [ 6] | [ 7] | [ 8] | [ 9] | [ 10] | [ 11] | [ 12] | [ 13] | [ 14] | [ 15] | [ 16] | [ 17] | [ 18] | [ 19] | [ 20] | [ 21] | [ 22] | [ 23] |
¡@
WEI-HO TSAI1,2 AND SHIH-JIE LIAO2
1Department of Electronic Engineering
2Graduate Institute of Computer and Communication Engineering
National Taipei University of Technology
Taipei, 106 Taiwan
Although the problem of automatic speaker identification has received considerable
attention, no work has been made to deal with overlapping speech that involves multiple
persons speaking simultaneously. This study proposes two approaches to automatically
identify both simultaneous and non-simultaneous speakers in an audio stream. The first
approach consists of an overlapping-speech detection component that determines if a test
audio recording contains overlapping speech, followed by either a single-speaker identifier
or a two-speaker identifier based on Gaussian mixture models. The second approach
runs the single-speaker identifier and two-speaker identifier in parallel. Recognizing that
the pairs of speakers can be vast in number, we propose using parallel model combination
technique to characterize the simultaneous voices of two speakers based on the individual
voice of each speaker. Our experiment results demonstrate the feasibility of the
proposed approaches.
Received September 11, 2008; revised December 30, 2008; accepted April 9, 2009.
Communicated by Chin-Teng Lin.