Investigating the Impacts of Misspellings in Patent Search by Combining Natural Language Tools and Rule-Based Approaches

Author:

Russo DavideORCID,Spreafico ChristianORCID,Avogadri SimoneORCID,Precorvi AndreaORCID

Abstract

Among all sources of technical information, patent information is one of the richest and most comprehensive. Knowing how to search in this mass of documents is becoming increasingly crucial. However, many users have limited knowledge of patents and search strategies, so they must use intuitive, often approximate approaches that can lead to highly inaccurate searches and be time-consuming. To address this problem, there are tools that help expand queries to increase recall so as not to miss good documents, however, it remains an open problem dealing with misspellings-based strategies. Typically, the problem of the presence of misspellings in patent text is underestimated even by experts in the field, and there is no specific functionality to handle it in the tools available, both free and paid. The goal of the article is to raise awareness about the difficulties in making a proper patent strategy that also takes into account the possible presence of misspellings. It is important to know where we expect to find them and how much these may affect the final result. In particular, it is chosen to divide misspellings into categories, distinguishing between misspellings associated with a generic keyword or multiword from misspellings in acronyms, chemical formulas, names of applicants, inventors, or names of specific formulas or theorems. At least one example case is given for each category, showing when and how it may affect the result. Finally, an integrated approach combining word and contextual embedding models based on deep learning with a rule-based algorithm based on wild cards and truncation operators is suggested for correcting the query, automatically suggesting the most consistent misspellings, thus achieving a more accurate and reliable result.

Publisher

MDPI AG

Reference31 articles.

1. The impact of spelling errors on patent search;Stein;Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics,2012

2. The text, the full text and nothing but the text: Part 1 – Standards for creating textual information in patent documents and general search implications

3. Ontology-based spelling correction for searching medical information;Moon,2007

4. On correcting misspelled queries in email search;Bhole;Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence,2015

5. WNSpell: A WordNet-based spell corrector;Huang;Proceedings of the 8th Global WordNet Conference (GWC),2016

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3