UDAT: Compound quantitative analysis of text using machine learning-Reference-Cited by-同舟云学术

UDAT: Compound quantitative analysis of text using machine learning

Published:2020-03-13 Issue:1 Volume:36 Page:187-208
ISSN:2055-7671
Container-title:Digital Scholarship in the Humanities
language:en
Short-container-title:

Author:

Shamir Lior¹

Affiliation:

1. Kansas State University, USA

Abstract

Abstract Computing machines allow quantitative analysis of large databases of text, providing knowledge that is difficult to obtain without using automation. This article describes Universal Data Analysis of Text (UDAT) —a text analysis method that extracts a large set of numerical text content descriptors from text files and performs various pattern recognition tasks such as classification, similarity between classes, correlation between text and numerical values, and query by example. Unlike several previously proposed methods, UDAT is not based on frequency of words and links between certain key words and topics. The method is implemented as an open-source software tool that can provide detailed reports about the quantitative analysis of sets of text files, as well as exporting the numerical text content descriptors in the form of comma-separated values files to allow statistical or pattern recognition analysis with external tools. It also allows the identification of specific text descriptors that differentiate between classes or correlate with numerical values and can be applied to problems related to knowledge discovery in domains such as literature and social media. UDAT is implemented as a command-line tool that runs in Windows, and the open source is available and can be compiled in Linux systems. UDAT can be downloaded from http://people.cs.ksu.edu/∼lshamir/downloads/udat.

Funder

National Science Foundation

Teaching to Increase Diversity and Equity in STEM

Association of American Colleges and Universities

Publisher

Oxford University Press (OUP)

Subject

Computer Science Applications,Linguistics and Language,Language and Linguistics,Information Systems

Link

http://academic.oup.com/dsh/article-pdf/36/1/187/37603493/fqaa007.pdf

Reference60 articles.

1. Pattern recognition;Bishop;Machine Learning,2006

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Analysis and Prevention of AI-Based Phishing Email Attacks;Electronics;2024-05-09

2. Tell me how you write and I'll tell you what you read: a study on the writing style of book reviews;Journal of Documentation;2023-06-22

3. Data Science Approach to Compare the Lyrics of Popular Music Artists;Unisia;2022-07-03

4. A new measurement method of Chinese texts’ difficulty based on the digital analysis of two-character continuations;Digital Scholarship in the Humanities;2022-05-12