An Empirical Evaluation of Similarity Coefficients for Binary Valued Data-Reference-Cited by-同舟云学术

An Empirical Evaluation of Similarity Coefficients for Binary Valued Data

Published:2011-04 Issue:2 Volume:7 Page:44-66
ISSN:1548-3924
Container-title:International Journal of Data Warehousing and Mining
language:en
Short-container-title:

Author:

Lewis David M.¹,Janeja Vandana P.²^ORCID

Affiliation:

1. Carnegie Mellon University, USA

2. University of Maryland, Baltimore County, USA

Abstract

In this paper, the authors present an empirical evaluation of similarity coefficients for binary valued data. Similarity coefficients provide a means to measure the similarity or distance between two binary valued objects in a dataset such that the attributes qualifying each object have a 0-1 value. This is useful in several domains, such as similarity of feature vectors in sensor networks, document search, router network mining, and web mining. The authors survey 35 similarity coefficients used in various domains and present conclusions about the efficacy of the similarity computed in (1) labeled data to quantify the accuracy of the similarity coefficients, (2) varying density of the data to evaluate the effect of sparsity of the values, and (3) varying number of attributes to see the effect of high dimensionality in the data on the similarity computed.

Publisher

IGI Global

Subject

Hardware and Architecture,Software

Reference24 articles.

1. Adam, N. R., Janeja, V. P., & Atluri, V. (2004). Neighborhood based detection of anomalies in high dimensional spatio-temporal sensor datasets. In Proceedings of the 2004 ACM Symposium on Applied Computing.

2. A k-mean clustering algorithm for mixed numeric and categorical data

3. Comparison of maize similarity and dissimilarity genetic coefficients based on microsatellite markers

4. Similarity of Binary Data

5. Choosing the best similarity index when performing fuzzy set ordination on binary data

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Two Probabilistic Models for Quick Dissimilarity Detection of Big Binary Data;WSEAS TRANSACTIONS ON MATHEMATICS;2021-05-19

2. Informationsgenerierung;Business Intelligence & Analytics – Grundlagen und praktische Anwendungen;2021

3. Discovering Similarity Across Heterogeneous Features;International Journal of Data Warehousing and Mining;2020-10

4. DeciTrustNET: A graph based trust and reputation framework for social networks;Information Fusion;2020-09

5. Probabilistic binary similarity distance for quick binary image matching;IET Image Processing;2018-10