Affiliation:
1. Electrical Engineering and Computer Science, University of California at Berkeley, Berkeley, California, United States of America
2. Google Inc., Mountain View, CA, United States of America
Abstract
Over the past decade, through a mixture of optical character recognition and manual input, there is now a growing corpus of Tibetan literature available as e-texts in Unicode format. With the creation of such a corpus, the techniques of text analytics that have been applied in the analysis of English and other modern languages may now be applied to Tibetan. In this work, we narrow our focus to examine a modest portion of that literature, the Mind-section portion of the literature of the Tibetan tradition of the Great Perfection. Here, we will use the lens of text analytics tools based on machine learning techniques to investigate a number of questions of interest to scholars of this and related traditions of the Great Perfection. It has been necessary for us to participate in all portions of this process: corpora identification and text edition selection, rendering the text as e-texts in Unicode using both Optical Character Recognition and manual entry, data cleaning and transformation, implementation of software for text analysis, and interpretation of results. For this reason, we hope this study can serve as a model for other low-resource languages that are just beginning to approach the problem of providing text analytics for their language.
Publisher
Association for Computing Machinery (ACM)
Reference70 articles.
1. Jean-Luc Achard. 1997. L'Essence Perlée du Secret. Brepols Turnhout Belgium. Jean-Luc Achard. 1997. L'Essence Perlée du Secret. Brepols Turnhout Belgium.
2. What Writing Does and How It Does It
3. Stylometric analysis of Chinese Buddhist texts—Do different Chinese translations of the Gaṇḍavyūha reflect stylistic features that are typical for their age;Bingenheimer Marcus;J. Japan. Assoc. Dig. Human.,2017
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献