| Previous | [ 1] | [ 2] | [ 3] | [ 4] | [ 5] | [ 6] | [ 7] | [ 8] | [ 9] | [ 10] | [ 11] | [ 12] | [ 13] | [ 14] | [ 15] | [ 16] | [ 17] | [ 18] | [ 19] | [ 20] |
¡@
PO-CHUAN LIN+, JHING-FA WANG, JIA-CHING WANG AND JUN-JIN HUANG
+Department of Electronics Engineering and Computer Science
Tung Fang Institute of Technology
Kaohsiung, 829 Taiwan
Department of Electrical Engineering
National Cheng Kung University
Tainan, 701 Taiwan
Conventional spoken sentence retrieval (SSR) relies on a large-vocabulary continuousspeech
recognition (LVCSR) system. This investigation proposes a feature-based speakerdependent
SSR algorithm using two-level matching. Users can speak keywords as the
query inputs to get the similarity ranks from a spoken sentence database. For instance, if
a user is looking for a relevant personal spoken sentence, ¡§October 12, I have a meeting
in New York¡¨ in the database, then the appropriate query input could be ¡§meeting¡¨,
¡§New York¡¨ or ¡§October¡¨. In the first level, a Similar Frame Tagging scheme is proposed
to locate possible segments of the database sentences that are similar to the user¡¦s query
utterance. In the second level, a Fine Similarity Evaluation between the query and each
possible segment is performed. Based on the feature-based comparison, the proposed algorithm
does not require acoustic and language models, thus our SSR algorithm is language
independent. Effective feature selection is the next issue in this paper. In addition to the
conventional mel frequency cepstrum coefficients (MFCCs), several MPEG-7 audio lowlevel
descriptors (LLDs) are also used as the features to exploit their ability for SSR. Experimental
results revealed that the retrieval performance using MPEG-7 audio LLDs
was close to that of the MFCCs. The combination of MPEG-7 audio LLDs and the MFCCs
could further improve the retrieval precision. Based on the feature-based matching, the
proposed algorithm has the advantages of language independent and speaker dependent
training free. Comparing to the original methods [10, 11], with only 0.026-0.05 precision
decrease, the addition and multiplication numbers are reduced by around a factor of lq
(frame number of query). It is particularly suitable for the use in resource-limited devices
Received June 27, 2007; revised November 6, 2007; accepted January 31, 2008.
Communicated by Liang-Gee Chen.
* The paper has been presented in the 8th Australian and New Zealand Conference on Intelligent Information
Systems (ANZIIS 2003), Sydney, Vol. 12, 2003, pp. 9-14, sponsored by the Ministry of Economic Affairs,
Department of Industrial Technology of the Taiwan, R.O.C.