Abstract
Corpora play an important role in linguistics research and foreign language teaching. At present, the relevant research on the corpus in China mainly uses WordSmith, Antconc and other retrieval tools. NLTK library, which is based on Python language, can provide more flexible and rich research methods, and it can use unified data standards to avoid the trouble of various data type conversion. At the same time, with the help of Python’s numerous third-party libraries, it can make up for the shortcomings of other tools in syntax analysis, graphic rendering, regular expression retrieval and other aspects. In terms of the main links in corpus research, such as text cleaning, word form restoration, part of speech tagging and text retrieval statistics, this paper takes the US presidential inaugural speech in the corpus as an example to show how to use this tool to process the language data, and introduces the application of Python NLTK library in corpus research.
Subject
Linguistics and Language,Language and Linguistics
Cited by
16 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献