Automated sentence boundary detection for spontaneous speech in Malay language / Muhammad Izzad Ramli

Ramli, Muhammad Izzad (2013) Automated sentence boundary detection for spontaneous speech in Malay language / Muhammad Izzad Ramli. Masters thesis, Universiti Teknologi MARA.

Abstract

Sentence boundary detection (SBD) or also known as sentence breaking decides where sentences begin and end. Sentence boundary detection is necessary in many applications, such as speech summarization, video summarization, speech document indexing and retrieval. This research describes sentence boundary detection in spontaneous Malay language spoken audio. Spontaneous speech is a speech that is not planned or arranged beforehand. Related speech studies for spontaneous Malay language speech are still lacking and no work has been done on sentence boundary. Previous studies showed that combination of linguistic and acoustic approach for sentence boundary detection is able to provide better than using only one approach. However, linguistic model for Malay language is still not available, only acoustic approach is used for Malay language sentence boundary detection. Therefore, the combination of prosodic features with volume features and rate-of-speech (ROS) was proposed for sentence boundary detection of spontaneous speeches. The data used are from spontaneous speeches of Malaysian Parliament Hansard Document (MPHD). Experiments are conducted on 42 minutes of Malay language spontaneous speeches comprising of 6,413 speech and non-speech segments. Then, non-speech segments are selected as the candidates for the sentence boundary detection experimental data. The accuracy achieved for the proposed speech and non-speech detection method is 97.8% and the sentence boundary detection is 100% with false alert 19.44%. As the outcome, the proposed methods of sentence boundary detection using fusion of prosodic features, volume and rate-of-speech (ROS) and Adaboost managed to detect and label sentence boundary automatically.

Metadata

Item Type: Thesis (Masters)
Creators:
Creators
Email / ID Num.
Ramli, Muhammad Izzad
2011289094
Contributors:
Contribution
Name
Email / ID Num.
Thesis advisor
Jamil, Nursuriati (Assoc. Prof. Dr.)
UNSPECIFIED
Subjects: Q Science > QA Mathematics > Response surfaces (Statistics)
Q Science > QA Mathematics > Evolutionary programming (Computer science). Genetic algorithms
Q Science > QA Mathematics > Web databases
Divisions: Universiti Teknologi MARA, Shah Alam > Faculty of Computer and Mathematical Sciences
Programme: Master of Sciences (CS 780)
Keywords: Sentence boundary detection (SBD), Malay, Language
Date: 2013
URI: https://ir.uitm.edu.my/id/eprint/47023
Edit Item
Edit Item

Download

[thumbnail of 47023.pdf] Text
47023.pdf

Download (134kB)

Digital Copy

Digital (fulltext) is available at:

Physical Copy

Physical status and holdings:
Item Status:
On Shelf

ID Number

47023

Indexing

Statistic

Statistic details