G-bic: generating synthetic benchmarks for biclustering-Reference-Cited by-同舟云学术

G-bic: generating synthetic benchmarks for biclustering

Published:2023-12-06 Issue:1 Volume:24 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Castanho Eduardo N.,Lobo João P.,Henriques Rui,Madeira Sara C.

Abstract

Abstract Background Biclustering is increasingly used in biomedical data analysis, recommendation tasks, and text mining domains, with hundreds of biclustering algorithms proposed. When assessing the performance of these algorithms, more than real datasets are required as they do not offer a solid ground truth. Synthetic data surpass this limitation by producing reference solutions to be compared with the found patterns. However, generating synthetic datasets is challenging since the generated data must ensure reproducibility, pattern representativity, and real data resemblance. Results We propose G-Bic, a dataset generator conceived to produce synthetic benchmarks for the normative assessment of biclustering algorithms. Beyond expanding on aspects of pattern coherence, data quality, and positioning properties, it further handles specificities related to mixed-type datasets and time-series data.G-Bic has the flexibility to replicate real data regularities from diverse domains. We provide the default configurations to generate reproducible benchmarks to evaluate and compare diverse aspects of biclustering algorithms. Additionally, we discuss empirical strategies to simulate the properties of real data. Conclusion G-Bic is a parametrizable generator for biclustering analysis, offering a solid means to assess biclustering solutions according to internal and external metrics robustly.

Funder

Fundação para a Ciência e a Tecnologia

H2020-RIA

INESC-ID Research Unit

LASIGE Research Unit

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/s12859-023-05587-4.pdf

Reference67 articles.