Word Segmentation for Burmese (Myanmar)-Reference-Cited by-同舟云学术

Word Segmentation for Burmese (Myanmar)

Published:2016-06-02 Issue:4 Volume:15 Page:1-10
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Ding Chenchen¹,Thu Ye Kyaw¹,Utiyama Masao¹,Sumita Eiichiro¹

Affiliation:

1. National Institute of Information and Communications Technology, Kyoto, Japan

Abstract

Experiments on various word segmentation approaches for the Burmese language are conducted and discussed in this note. Specifically, dictionary-based, statistical, and machine learning approaches are tested. Experimental results demonstrate that statistical and machine learning approaches perform significantly better than dictionary-based approaches. We believe that this note, based on an annotated corpus of relatively considerable size (containing approximately a half million words), is the first systematic comparison of word segmentation approaches for Burmese. This work aims to discover the properties and proper approaches to Burmese textual processing and to promote further researches on this understudied language.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/2846095

Reference15 articles.

1. Chinese word segmentation: A decade review;Huang Chang-Ning;J. Chin. Inform. Process.,2007

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Fast Recurrent Neural Network with Bi-LSTM for Handwritten Tamil Text Segmentation in NLP;ACM Transactions on Asian and Low-Resource Language Information Processing;2024-05-10

2. Bi-directional Maximal Matching Algorithm to Segment Khmer Words in Sentence;J INF PROCESS SYST;2022

3. Tackling Hate Speech in Low-resource Languages with Context Experts;International Conference on Information & Communication Technologies and Development 2022;2022-06-27

4. Towards Tokenization and Part-of-Speech Tagging for Khmer: Data and Discussion;ACM Transactions on Asian and Low-Resource Language Information Processing;2021-11-30

5. A Neural Joint Model with BERT for Burmese Syllable Segmentation, Word Segmentation, and POS Tagging;ACM Transactions on Asian and Low-Resource Language Information Processing;2021-07-31