Abstract
Speech is structured in cohesive sequences of information units defined with unique speaker-related intonation patterns, known as prosody. To capture the communicative intention of a speaker, listeners first focus on the change of rising and/or falling prosody patterns (known as speech events) on words and phrasing to capture the degree of information conveyed. Secondly, into national breaks are used by speakers to convey speech content in a sequence of smaller sub-segments known as phrasing. Incorrect perception of the events and boundaries causes the communicative intention to be falsely understood by listeners. Thus, previous works evaluate the role of boundaries in languages like English, Swedish, Japanese, and Chinese to help automatic speech recognition and understanding domain. The main issue addressed by previous work on boundary classification is the disagreement on prosody event labeling. Boundaries are falsely classified due to limitations on expert evaluation of prosody events and lack of a prosody standard. Also, evaluating how the pause and intonation variations discriminate between a phrase with a continual content from other subsequent phrases that carry the end (final) content is still an open question. Limited work on the role of prosody is often defined based on specific language lexical rules, which degrades understanding of phrasing and speech content for the under-resourced language. As related research are limited to prosody stress indicating prominence on individual Malay words; thus, processes for defining the prosody governing Malay phrase boundaries are essential. This research aims to evaluate the correlates of prosody on boundaries with a deep (finality) or a shallow (continual) content. Dataset for the boundary classification task contains formal statements recorded from the debate session from the Malaysian Parliamentary Speech recordings. In the first phase, a refined set of Rapid Phrase Prosody Tasks (RPIT) instructions labelled the speech signal with perceived continual or final content. Responses from four male and four female volunteers are analysed using KAPPA and Krippendorff analysis to construct a Malay phrase (MySP) boundaries dataset with an average of 85% agreement on perceived boundary labels from the first research phase. Pitch regression and the rise-fall-connection (RFC) contour parameters are used to model the role of prosody on the perceived boundaries. A new phrase strength correlates computed as slope feature from the highest nuclei point towards the end of the boundary word region. The role of each vector combined with the word and silence durations to signify each deep (final) and shallow (continual) boundary is tested with supervised classifiers. The supervised K-Nearest Neighbour (KNN), Random Forest (RF), and Logistic Regression (Log-Reg) models predicted the boundary classes with up to 75% accuracy. A higher degree of slope excursion is observed on boundaries that the listeners perceived as the deep boundaries (evaluated with finality in speech content through the RPIT) and used to improve classification results on up to 20% of the falsely classified boundaries. This study contributes to the Malay prosody knowledge and classification of phrasing the boundaries
Metadata
Item Type: | Thesis (PhD) |
---|---|
Creators: | Creators Email / ID Num. Mohamed Hanum, Haslizatul Fairuz 2011218432 |
Contributors: | Contribution Name Email / ID Num. Thesis advisor Abdullah, Nur Atiqah Sia (Assoc. Prof. Ts. Dr. ) UNSPECIFIED |
Subjects: | P Language and Literature > P Philology. Linguistics > Language. Linguistic theory. Comparative grammar P Language and Literature > P Philology. Linguistics > Language. Linguistic theory. Comparative grammar > Comparative grammar > Phonology. Phonetics |
Divisions: | Universiti Teknologi MARA, Shah Alam > Faculty of Computer and Mathematical Sciences |
Programme: | Doctor of Philosophy (Information Technology and Quantitative Science) – CS990 |
Keywords: | speech, malay, intonation |
Date: | October 2021 |
URI: | https://ir.uitm.edu.my/id/eprint/54987 |
Download
54987.pdf
Download (220kB)