Text Representations for Patent Classification-Reference-Cited by-同舟云学术

Text Representations for Patent Classification

Published:2013-09 Issue:3 Volume:39 Page:755-775
ISSN:0891-2017
Container-title:Computational Linguistics
language:en
Short-container-title:Computational Linguistics

Author:

D'hondt Eva¹,Verberne Suzan¹,Koster Cornelis¹,Boves Lou¹

Affiliation:

1. Radboud University Nijmegen

Abstract

With the increasing rate of patent application filings, automated patent classification is of rising economic importance. This article investigates how patent classification can be improved by using different representations of the patent documents. Using the Linguistic Classification System (LCS), we compare the impact of adding statistical phrases (in the form of bigrams) and linguistic phrases (in two different dependency formats) to the standard bag-of-words text representation on a subset of 532,264 English abstracts from the CLEF-IP 2010 corpus. In contrast to previous findings on classification with phrases in the Reuters-21578 data set, for patent classification the addition of phrases results in significant improvements over the unigram baseline. The best results were achieved by combining all four representations, and the second best by combining unigrams and lemmatized bigrams. This article includes extensive analyses of the class models (a.k.a. class profiles) created by the classifiers in the LCS framework, to examine which types of phrases are most informative for patent classification. It appears that bigrams contribute most to improvements in classification accuracy. Similar experiments were performed on subsets of French and German abstracts to investigate the generalizability of these findings.

Publisher

MIT Press - Journals

Subject

Artificial Intelligence,Computer Science Applications,Linguistics and Language,Language and Linguistics

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/COLI_a_00149

Reference38 articles.

1. Automated learning of decision rules for text categorization

2. Automated Patent Classification

Cited by 34 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Deep learning for predicting patent application outcome: The fusion of text and network embeddings;Journal of Informetrics;2023-05

2. Automated patent classification for crop protection via domain adaptation;Applied AI Letters;2023-02

3. Unveiling Black-Boxes: Explainable Deep Learning Models for Patent Classification;Communications in Computer and Information Science;2023

4. SEA-PS: Semantic embedding with attention to measuring patent similarity by leveraging various text fields;Journal of Information Science;2022-07-08

5. AI for Patents: A Novel Yet Effective and Efficient Framework for Patent Analysis;IEEE Access;2022