New models for symbolic data analysis-Reference-Cited by-同舟云学术

New models for symbolic data analysis

Published:2022-09-19 Issue: Volume: Page:
ISSN:1862-5347
Container-title:Advances in Data Analysis and Classification
language:en
Short-container-title:Adv Data Anal Classif

Author:

Beranger Boris^ORCID,Lin Huan^ORCID,Sisson Scott^ORCID

Abstract

AbstractSymbolic data analysis (SDA) is an emerging area of statistics concerned with understanding and modelling data that takes distributional form (i.e. symbols), such as random lists, intervals and histograms. It was developed under the premise that the statistical unit of interest is the symbol, and that inference is required at this level. Here we consider a different perspective, which opens a new research direction in the field of SDA. We assume that, as with a standard statistical analysis, inference is required at the level of individual-level data. However, the individual-level data are unobserved, and are aggregated into observed symbols—group-based distributional-valued summaries—prior to the analysis. We introduce a novel general method for constructing likelihood functions for symbolic data based on a desired probability model for the underlying measurement-level data, while only observing the distributional summaries. This approach opens the door for new classes of symbol design and construction, in addition to developing SDA as a viable tool to enable and improve upon classical data analyses, particularly for very large and complex datasets. We illustrate this new direction for SDA research through several real and simulated data analyses, including a study of novel classes of multivariate symbol construction techniques.

Funder

Australian Centre of Excellence for Mathematical and Statistical Frontiers

Australian Research Council Discovery Project Scheme

Australian Research Council Fellowship

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Statistics and Probability

Link

https://link.springer.com/content/pdf/10.1007/s11634-022-00520-8.pdf

Reference48 articles.

1. Andrieu C, Roberts GO (2009) The pseudo-marginal approach for efficient Monte Carlo computations. Ann Stat 37:697–725

2. Bardenet R, Doucet A, Holmes C (2014) Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach. In: Proceedings of the 31st international conference on machine learning (ICML-14), pp 405–413

3. Billard L (2011) Brief overview of symbolic data and analytic issues. Stat Anal Data Min 4:149–156

4. Billard L, Diday E (2003) From the statistics of data to the statistics of knowledge: symbolic data analysis. J Am Stat Assoc 98:470–487

5. Billard L, Diday E (2006) Symbolic data analysis. Wiley Series in Computational Statistics. Wiley, Chichester

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Nonparametric estimation and forecasting of interval-valued time series regression models with constraints;Expert Systems with Applications;2024-09

2. Image Feature Extraction Using Symbolic Data of Cumulative Distribution Functions;Mathematics;2024-07-03

3. 3-D probability density imaging of Euler solutions using gravity data: a case study of Mount Milligan, Canada;Acta Geophysica;2024-01-31

4. MLE for the parameters of bivariate interval-valued model;Advances in Data Analysis and Classification;2023-06-18

5. Asymptotic Distribution of Certain Types of Entropy under the Multinomial Law;Entropy;2023-04-28