Two Probabilistic Models for Quick Dissimilarity Detection of Big Binary Data

Author:

Mustafa Adnan A.1

Affiliation:

1. Department of Mechanical Engineering, Kuwait University, P.o. Box 5969 Safat, Kuwait

Abstract

The task of data matching arises frequently in many aspects of science. It can become a time consuming process when the data is being matched to a huge database consisting of thousands of possible candidates, and the goal is to find the best match. It can be even more time consuming if the data are big (> 100 MB). One approach to reducing the time complexity of the matching process is to reduce the search space by introducing a pre-matching stage, where very dissimilar data are quickly removed. In this paper we focus our attention to matching big binary data. In this paper we present two probabilistic models for the quick dissimilarity detection of big binary data: the Probabilistic Model for Quick Dissimilarity Detection of Binary vectors (PMQDD) and the Inverse-equality Probabilistic Model for Quick Dissimilarity Detection of Binary vectors (IPMQDD). Dissimilarity detection between binary vectors can be accomplished quickly by random element mapping. The detection technique is not a function of data size and hence dissimilarity detection is performed quickly. We treat binary data as binary vectors, and hence any binary data of any size and dimension is treated as a binary vector. PMQDD is based on a binary similarity distance that does not recognize data and its exact inverse as containing the same pattern and hence considers them to be different. However, in some applications a specific data and its inverse, are regarded as the same pattern, and thus should be identified as being the same; IPMQDD is able to identify such cases, as it is based on a similarity distance that does not distinguish between data and its inverse instance as being dissimilar. We present a comparative analysis between PMQDD and IPMQDD, as well as their similarity distances. We present an application of the models to a set of object models, that show the effectiveness and power of these models

Publisher

World Scientific and Engineering Academy and Society (WSEAS)

Subject

General Mathematics

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3