Similarity-Based Predictive Models: Sensitivity Analysis and a Biological Application with Multi-Attributes

Author:

Sanchez Jeniffer D.1ORCID,Rêgo Leandro C.12ORCID,Ospina Raydonal23ORCID,Leiva Víctor4ORCID,Chesneau Christophe5ORCID,Castro Cecilia6ORCID

Affiliation:

1. Department of Statistics and Applied Mathematics, Universidade Federal do Ceara, Fortaleza 60020-181, Brazil

2. Department of Statistics, Universidade Federal de Pernambuco, Recife 50670-901, Brazil

3. Department of Statistics, IME, Universidade Federal da Bahia, Salvador 40170-110, Brazil

4. School of Industrial Engineering, Pontificia Universidad Católica de Valparaíso, Valparaíso 2362807, Chile

5. Department of Mathematics, Université de Caen, 14032 Caen, France

6. Centre of Mathematics, Universidade do Minho, 4710-057 Braga, Portugal

Abstract

Predictive models based on empirical similarity are instrumental in biology and data science, where the premise is to measure the likeness of one observation with others in the same dataset. Biological datasets often encompass data that can be categorized. When using empirical similarity-based predictive models, two strategies for handling categorical covariates exist. The first strategy retains categorical covariates in their original form, applying distance measures and allocating weights to each covariate. In contrast, the second strategy creates binary variables, representing each variable level independently, and computes similarity measures solely through the Euclidean distance. This study performs a sensitivity analysis of these two strategies using computational simulations, and applies the results to a biological context. We use a linear regression model as a reference point, and consider two methods for estimating the model parameters, alongside exponential and fractional inverse similarity functions. The sensitivity is evaluated by determining the coefficient of variation of the parameter estimators across the three models as a measure of relative variability. Our results suggest that the first strategy excels over the second one in effectively dealing with categorical variables, and offers greater parsimony due to the use of fewer parameters.

Funder

the National Council for Scientific and Technological Development

the Comissão de Aperfeiçoamento de Pessoal do Nível Superior

FONDECYT

Portuguese funds through the CMAT-Research Centre of Mathematics, University of Minho

Publisher

MDPI AG

Subject

General Agricultural and Biological Sciences,General Immunology and Microbiology,General Biochemistry, Genetics and Molecular Biology

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3