A N-gram based approach to auto-extracting topics from research articles1

Author:

Zhu Linkai12,Wang Wennan2,Huang Maoyi3,Chen Maomao4,Wang Yiyun567,Cai Zhiming2

Affiliation:

1. Institute of Software, Chinese Academy of Sciences, Beijing, China

2. Institute of Data Science, City University of Macau, Macau, China

3. Product Development, Ericsson, Gothenburg, Sweden

4. Department of Computer Science and Engineering, University of Gothenburg, Gothenburg, Sweden

5. Faculty of Education, University of Malaya, Kuala Lumpur, Malaysia

6. Applied Psychology Program, BNU-HKBU United International College, China

7. College of Education for the Future, Beijing Normal University, Zhuhai, PR China

Abstract

A lot of manual work goes into identifying a topic for an article. With a large volume of articles, the manual process can be exhausting. Our approach aims to address this issue by automatically extracting topics from the text of large numbers of articles. This approach takes into account the efficiency of the process. Based on existing N-gram analysis, our research examines how often certain words appear in documents in order to support automatic topic extraction. In order to improve efficiency, we apply custom filtering standards to our research. Additionally, delete as many noncritical or irrelevant phrases as possible. In this way, we can ensure we are selecting unique keyphrases for each article, which capture its core idea1. For our research, we chose to center on the autonomous vehicle domain, since the research is relevant to our daily lives. We have to convert the PDF versions of most of the research papers into editable types of files such as TXT. This is because most of the research papers are only in PDF format. To test our proposed idea of automating, numerous articles on robotics have been selected. Next, we evaluate our approach by comparing the result with other models.

Publisher

IOS Press

Subject

Artificial Intelligence,General Engineering,Statistics and Probability

Reference19 articles.

1. Turney P.D. , Learning algorithms for keyphrase extraction, Institute for Information Technology, National Research Council of Canada, Ottawa, Ontario, Canada, October 1999.

2. YAKE! Keyword extraction from single documents using multiple local features;Campos;Inf Sci (Ny),2020

3. Liu Z. , Huang W. , Zheng Y. and Sun M. , Automatic Keyphrase Extraction via Topic Decomposition, In Proc. of the Empirical Methods in Natural Language Processing, Computational Linguistics, pp. 366–376. MIT, Massachusetts, USA. October 2010.

4. Topic Automatic Extraction Model based on Unstructured Security Intelligence Report;Hur;J Korea Converg Soc,2019

5. Automatic keyphrase extraction: a survey and trends;Merrouni;J Intell Inf Syst,2020

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Sentiment Analysis in Product Reviews with Maximum Entropy and Naïve Bayes Using N-gram Method;2023 6th International Conference on Information and Communications Technology (ICOIACT);2023-11-10

2. An automatic speech analytics program for digital assessment of stress burden and psychosocial health;npj Mental Health Research;2023-09-13

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3