The Application of NLTK Library for Python Natural Language Processing in Corpus Research-Reference-Cited by-同舟云学术

The Application of NLTK Library for Python Natural Language Processing in Corpus Research

Published:2021-09-01 Issue:9 Volume:11 Page:1041-1049
ISSN:2053-0692
Container-title:Theory and Practice in Language Studies
language:
Short-container-title:tpls

Author:

Wang Meng,Hu Fanghui

Abstract

Corpora play an important role in linguistics research and foreign language teaching. At present, the relevant research on the corpus in China mainly uses WordSmith, Antconc and other retrieval tools. NLTK library, which is based on Python language, can provide more flexible and rich research methods, and it can use unified data standards to avoid the trouble of various data type conversion. At the same time, with the help of Python’s numerous third-party libraries, it can make up for the shortcomings of other tools in syntax analysis, graphic rendering, regular expression retrieval and other aspects. In terms of the main links in corpus research, such as text cleaning, word form restoration, part of speech tagging and text retrieval statistics, this paper takes the US presidential inaugural speech in the corpus as an example to show how to use this tool to process the language data, and introduces the application of Python NLTK library in corpus research.

Publisher

Academy Publication

Subject

Linguistics and Language,Language and Linguistics

Cited by 16 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Navigating pathways to automated personality prediction: a comparative study of small and medium language models;Frontiers in Big Data;2024-09-13

2. Evolving Landscape of Smart Libraries: A Diachronic Analysis of Themes and Trends;Technical Services Quarterly;2024-08-23

3. Multi-view Counterfactual Contrastive Learning for Fact-checking Fake News Detection;Proceedings of the 2024 International Conference on Multimedia Retrieval;2024-05-30

4. Extraction of Meta-Data for Recommendation Using Keyword Mapping;IEEE Access;2024

5. The landscape of decentralized clinical trials (DCTs): focusing on the FDA and EMA guidance;Translational and Clinical Pharmacology;2024