Soft Bigram distance for names matching

Author:

Hadwan Mohammed123,Al-Hagery Mohammed A.4,Al-Sanabani Maher5,Al-Hagree Salah6

Affiliation:

1. Department of Information Technology, College of Computer, Qassim University, Buraydah, Saudi Arabia

2. Intelligent Analytics Group (IAG), College of Computer, Qassim University, Buraydah, Saudi Arabia

3. Department of Computer Sciences, Faculty of Applied Sciences, Taiz University, Taiz, Yemen

4. Department of Computer Science, College of Computer, Qassim University, Buraydah, Saudi Arabia

5. Faculty of Computer Science and Information Systems, Thamar University, Thamar, Yemen

6. Department of Computer Sciences & Information Technology, IBB University, IBB, Yemen

Abstract

Background Bi-gram distance (BI-DIST) is a recent approach to measure the distance between two strings that have an important role in a wide range of applications in various areas. The importance of BI-DIST is due to its representational and computational efficiency, which has led to extensive research to further enhance its efficiency. However, developing an algorithm that can measure the distance of strings accurately and efficiently has posed a major challenge to many developers. Consequently, this research aims to design an algorithm that can match the names accurately. BI-DIST distance is considered the best orthographic measure for names identification; nevertheless, it lacks a distance scale between the name bigrams. Methods In this research, the Soft Bigram Distance (Soft-Bidist) measure is proposed. It is an extension of BI-DIST by softening the scale of comparison among the name Bigrams for improving the name matching. Different datasets are used to demonstrate the efficiency of the proposed method. Results The results show that Soft-Bidist outperforms the compared algorithms using different name matching datasets.

Funder

Deanship of Scientific Research, Qassim University

Publisher

PeerJ

Subject

General Computer Science

Reference26 articles.

1. Using character N-grams to match a list of publications to references in bibliographic databases;Abdulhayoglu;Scientometrics,2016

2. Evaluation of N-gram conflation approaches for arabic text retrieval;Ahmed;Journal of the American Society for Information Science and Technology,2009

3. Designing an accurate and efficient algorithm for matching arabic names;Al-Hagree,2019

4. An improved N-gram distance for names matching;Al-Hagree,2019

5. Improved an algorithm for Arabic name matching;Al-Sanabani;Open Transactions on Information Processing,2015

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Smart System for Dengue Fever Diagnosis: A Machine Learning Approach;2023 3rd International Conference on Emerging Smart Technologies and Applications (eSmarTA);2023-10-10

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3