Abstract
Sentence boundary detection (SBD) or also known as sentence breaking decides where sentences begin and end. Sentence boundary detection is necessary in many applications, such as speech summarization, video summarization, speech document indexing and retrieval. This research describes sentence boundary detection in spontaneous Malay language spoken audio. Spontaneous speech is a speech that is not planned or arranged beforehand. Related speech studies for spontaneous Malay language speech are still lacking and no work has been done on sentence boundary. Previous studies showed that combination of linguistic and acoustic approach for sentence boundary detection is able to provide better than using only one approach. However, linguistic model for Malay language is still not available, only acoustic approach is used for Malay language sentence boundary detection. Therefore, the combination of prosodic features with volume features and rate-of-speech (ROS) was proposed for sentence boundary detection of spontaneous speeches. The data used are from spontaneous speeches of Malaysian Parliament Hansard Document (MPHD). Experiments are conducted on 42 minutes of Malay language spontaneous speeches comprising of 6,413 speech and non-speech segments. Then, non-speech segments are selected as the candidates for the sentence boundary detection experimental data. The accuracy achieved for the proposed speech and non-speech detection method is 97.8% and the sentence boundary detection is 100% with false alert 19.44%. As the outcome, the proposed methods of sentence boundary detection using fusion of prosodic features, volume and rate-of-speech (ROS) and Adaboost managed to detect and label sentence boundary automatically.
Metadata
Item Type: | Thesis (Masters) |
---|---|
Creators: | Creators Email / ID Num. Ramli, Muhammad Izzad 2011289094 |
Contributors: | Contribution Name Email / ID Num. Thesis advisor Jamil, Nursuriati (Assoc. Prof. Dr.) UNSPECIFIED |
Subjects: | Q Science > QA Mathematics > Response surfaces (Statistics) Q Science > QA Mathematics > Evolutionary programming (Computer science). Genetic algorithms Q Science > QA Mathematics > Web databases |
Divisions: | Universiti Teknologi MARA, Shah Alam > Faculty of Computer and Mathematical Sciences |
Programme: | Master of Sciences (CS 780) |
Keywords: | Sentence boundary detection (SBD), Malay, Language |
Date: | 2013 |
URI: | https://ir.uitm.edu.my/id/eprint/47023 |
Download
47023.pdf
Download (134kB)