Predictive Analysis for Text Classification: Discrete Units in Company Registration Discourse

Author:

Więcławska Edyta1ORCID

Affiliation:

1. University of Rzeszów , Poland

Abstract

Abstract Legal discourse shows variation most commonly in terms of contrasts between languages, textual genres, communicative settings (professional vs. lay communication), translation methods and categories of authors, the last constituting a testing ground for the text-prediction task presented in this article. The research project involves quantitative analysis of selected discrete units and their statistical processing with the R tool for the purpose of generating random forest and decision tree models. It is hypothesised that it is possible to effectively predict text authorship based on the grammatical profile of the texts. The prediction model proposed here covers two authorship categories, institutional name and professional title, and these encapsulate authorship sub-categories related to institutional and work position background. The prediction accuracy parameters for the authorship-based text classification in both cases prove to be statistically satisfactory. More specific findings show that the text classification models for some authorship sub-categories are more effective than for others. Further, some discrete units have distinctively high discriminative power for the texts. The analysis is conducted on a custom-designed corpus, composed of English texts processed in company registration proceedings. The corpus is homogenous in terms of the function and the communicative context of the texts, which assures reliability of the findings and at the same time captures the variationist aspect of legal communication by taking the varied authorship factor into account.

Publisher

University of Bialystok

Reference34 articles.

1. Aijmer K., Parallel and Comparable Corpora, (in:) A. Lüdeling, M. Kytö (eds.), Corpus Linguistics: An International Handbook, Berlin/New York 2009, pp. 275–291.

2. Baayen H., van Halteren H., Neijt A., Tweedie E., An Experiment in Authorship Attribution, (in:) Proceedings of JADT 2002, St. Malo 2002, pp. 29–37.

3. Baayen H., van Halteren H., Tweedie F., Outside the Cave of Shadows: Using Syntactic Annotation to Enhance Authorship Attribution, ‘Literary and Linguistic Computing’ 1996, vol. 1, no. 13, pp. 121–131.10.1093/llc/11.3.121

4. Bhargava M., Mehndiratta P., Asawa K., Stylometric Analysis for Authorship Attribution on Twitter, (in:) V. Bhatnagar, S. Srinivasa (eds.), Big Data Analytics. Second International Conference, BDA 2013 Mysore, India, December 2013 Proceedings. New York/Dordrecht/London 2013, pp. 37–47.10.1007/978-3-319-03689-2_3

5. Bhatia V.K., Critical Genre Analysis: Investigating Interdiscursive Performance in Professional Practice, New York 2017.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3