IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Co-Occurrence-Based Error Correction Approach to Word Segmentation

Co-Occurrence-Based Error Correction Approach to Word Segmentation
View Sample PDF
Author(s): Ekawat Chaowicharat (Mahidol University, Thailand)and Kanlaya Naruedomkul (Mahidol University, Thailand)
Copyright: 2012
Pages: 11
Source title: Cross-Disciplinary Advances in Applied Natural Language Processing: Issues and Approaches
Source Author(s)/Editor(s): Chutima Boonthum-Denecke (Hampton University, USA), Philip M. McCarthy (The University of Memphis, USA)and Travis Lamkin (University of Memphis, USA)
DOI: 10.4018/978-1-61350-447-5.ch023

Purchase

View Co-Occurrence-Based Error Correction Approach to Word Segmentation on the publisher's website for pricing and purchasing information.

Abstract

A number of word segmentation algorithms have been offered in the past; however, there is still room for improvement. Co-occurrence-Based Error Correction (CBEC), the proposed approach in this chapter, is a novel Thai word segmentation approach that was designed to provide accurate segmentation results based on context and purpose. CBEC quickly segments the input string using any available algorithm; maximal matching was used in the experiment. Next, CBEC checks its segmentation output against an error risk data bank to determine if there is any error risk. The error risk data bank is developed based on a training corpus. The current version of the error risk bank was based on the training corpus available at BEST 2009. Then, CBEC re-segments the input string using the co-occurrence score of the word sequence to ensure the accuracy of the segmentation result.

Related Content

Reinaldo Padilha França, Ana Carolina Borges Monteiro, Rangel Arthur, Yuzo Iano. © 2021. 21 pages.
Abdul Kader Saiod, Darelle van Greunen. © 2021. 28 pages.
Aswini R., Padmapriya N.. © 2021. 22 pages.
Zubeida Khan, C. Maria Keet. © 2021. 21 pages.
Neha Gupta, Rashmi Agrawal. © 2021. 20 pages.
Kamalendu Pal. © 2021. 14 pages.
Joy Nkechinyere Olawuyi, Bernard Ijesunor Akhigbe, Babajide Samuel Afolabi, Attoh Okine. © 2021. 19 pages.
Body Bottom