Automated sentence boundary detection for spontaneous speech in Malay language / Muhammad Izzad Ramli

Ramli, Muhammad Izzad (2013) Automated sentence boundary detection for spontaneous speech in Malay language / Muhammad Izzad Ramli. Masters thesis, Universiti Teknologi MARA.


[thumbnail of 47023.pdf] Text

Download (134kB)


Sentence boundary detection (SBD) or also known as sentence breaking decides where sentences begin and end. Sentence boundary detection is necessary in many applications, such as speech summarization, video summarization, speech document indexing and retrieval. This research describes sentence boundary detection in spontaneous Malay language spoken audio. Spontaneous speech is a speech that is not planned or arranged beforehand. Related speech studies for spontaneous Malay language speech are still lacking and no work has been done on sentence boundary. Previous studies showed that combination of linguistic and acoustic approach for sentence boundary detection is able to provide better than using only one approach. However, linguistic model for Malay language is still not available, only acoustic approach is used for Malay language sentence boundary detection. Therefore, the combination of prosodic features with volume features and rate-of-speech (ROS) was proposed for sentence boundary detection of spontaneous speeches. The data used are from spontaneous speeches of Malaysian Parliament Hansard Document (MPHD). Experiments are conducted on 42 minutes of Malay language spontaneous speeches comprising of 6,413 speech and non-speech segments. Then, non-speech segments are selected as the candidates for the sentence boundary detection experimental data. The accuracy achieved for the proposed speech and non-speech detection method is 97.8% and the sentence boundary detection is 100% with false alert 19.44%. As the outcome, the proposed methods of sentence boundary detection using fusion of prosodic features, volume and rate-of-speech (ROS) and Adaboost managed to detect and label sentence boundary automatically.


Item Type: Thesis (Masters)
Ramli, Muhammad Izzad
Email / ID Num.
Thesis advisor
Jamil, Nursuriati (Assoc. Prof. Dr.)
Subjects: Q Science > QA Mathematics > Response surfaces (Statistics)
Q Science > QA Mathematics > Evolutionary programming (Computer science). Genetic algorithms
Q Science > QA Mathematics > Web databases
Divisions: Universiti Teknologi MARA, Shah Alam > Faculty of Computer and Mathematical Sciences
Programme: Master of Sciences (CS 780)
Item ID: 47023
Uncontrolled Keywords: Sentence boundary detection (SBD), Malay, Language


Fulltext is available at:
  • Koleksi Akses Terhad | PTAR Utama | Shah Alam
  • ID Number



    View in Google Scholar

    Edit Item
    Edit Item