Multiclass patent document classification

Author:

Anne Chaitanya,Mishra Avdesh,Hoque Md Tamjidul,Tu Shengru

Abstract

Text classification is used in information extraction and retrieval from a given text, and text classification has been considered as an important step to manage a vast number of records given in digital form that is far-reaching and expanding. This article addresses patent document classification problem into fifteen different categories or classes, where some classes overlap with each other for practical reasons. For the development of the classification model using machine learning techniques, useful features have been extracted from the given documents. The features are used to classify patent document as well as to generate useful tag-words. The overall objective of this work is to systematize NASA’s patent management, by developing a set of automated tools that can assist NASA to manage and market its portfolio of intellectual properties (IP), and to enable easier discovery of relevant IP by users. We have identified an array of methods that can be applied such as k-Nearest Neighbors (kNN), two variations of the Support Vector Machine (SVM) algorithms, and two tree based classification algorithms: Random Forest and J48. The major research steps in this paper consist of filtering techniques for variable selection, information gain and feature correlation analysis, and training and testing potential models using effective classifiers. Further, the obstacles associated with the imbalanced data were mitigated by adding pseudo-synthetic data wherever appropriate, which resulted in a superior SVM classifier based model.

Publisher

Sciedu Press

Cited by 13 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. The One-vs-Rest Method for a Multilabel Patent Classification Machine Learning Approach using a Regression Model;2023 International Conference on Informatics, Multimedia, Cyber and Informations System (ICIMCIS);2023-11-07

2. Auto Organizer: A Machine Learning-Based Tool for Automatic Organization of Files;Proceedings of the 2nd International Conference on Signal and Data Processing;2023

3. Customized adjuncts with clear aligner therapy: “The Golden Circle Model” explained!;Journal of the World Federation of Orthodontists;2022-12

4. On the design of Bayesian principled algorithms for imbalanced classification;Knowledge-Based Systems;2021-06

5. A Comparison of Pre-Trained Language Models for Multi-Class Text Classification in the Financial Domain;Companion Proceedings of the Web Conference 2021;2021-04-19

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3