Multiclass patent document classification-Reference-Cited by-同舟云学术

Multiclass patent document classification

Published:2017-12-15 Issue:1 Volume:7 Page:1
ISSN:1927-6982
Container-title:Artificial Intelligence Research
language:
Short-container-title:AIR

Author:

Anne Chaitanya,Mishra Avdesh,Hoque Md Tamjidul,Tu Shengru

Abstract

Text classification is used in information extraction and retrieval from a given text, and text classification has been considered as an important step to manage a vast number of records given in digital form that is far-reaching and expanding. This article addresses patent document classification problem into fifteen different categories or classes, where some classes overlap with each other for practical reasons. For the development of the classification model using machine learning techniques, useful features have been extracted from the given documents. The features are used to classify patent document as well as to generate useful tag-words. The overall objective of this work is to systematize NASA’s patent management, by developing a set of automated tools that can assist NASA to manage and market its portfolio of intellectual properties (IP), and to enable easier discovery of relevant IP by users. We have identified an array of methods that can be applied such as k-Nearest Neighbors (kNN), two variations of the Support Vector Machine (SVM) algorithms, and two tree based classification algorithms: Random Forest and J48. The major research steps in this paper consist of filtering techniques for variable selection, information gain and feature correlation analysis, and training and testing potential models using effective classifiers. Further, the obstacles associated with the imbalanced data were mitigated by adding pseudo-synthetic data wherever appropriate, which resulted in a superior SVM classifier based model.

Publisher

Sciedu Press

Cited by 13 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. The One-vs-Rest Method for a Multilabel Patent Classification Machine Learning Approach using a Regression Model;2023 International Conference on Informatics, Multimedia, Cyber and Informations System (ICIMCIS);2023-11-07

2. Auto Organizer: A Machine Learning-Based Tool for Automatic Organization of Files;Proceedings of the 2nd International Conference on Signal and Data Processing;2023

3. Customized adjuncts with clear aligner therapy: “The Golden Circle Model” explained!;Journal of the World Federation of Orthodontists;2022-12

4. On the design of Bayesian principled algorithms for imbalanced classification;Knowledge-Based Systems;2021-06

5. A Comparison of Pre-Trained Language Models for Multi-Class Text Classification in the Financial Domain;Companion Proceedings of the Web Conference 2021;2021-04-19