Upscaling human activity data: A statistical ecology approach-Reference-Cited by-同舟云学术

Upscaling human activity data: A statistical ecology approach

Published:2021-07-01 Issue:7 Volume:16 Page:e0253461
ISSN:1932-6203
Container-title:PLOS ONE
language:en
Short-container-title:PLoS ONE

Author:

Tovo Anna^ORCID,Stivanello Samuele,Maritan Amos,Suweis Samir,Favaro Stefano,Formentin Marco

Abstract

Big data require new techniques to handle the information they come with. Here we consider four datasets (email communication, Twitter posts, Wikipedia articles and Gutenberg books) and propose a novel statistical framework to predict global statistics from random samples. More precisely, we infer the number of senders, hashtags and words of the whole dataset and how their abundances (i.e. the popularity of a hashtag) change through scales from a small sample of sent emails per sender, posts per hashtag and word occurrences. Our approach is grounded on statistical ecology as we map inference of human activities into the unseen species problem in biodiversity. Our findings may have applications to resource management in emails, collective attention monitoring in Twitter and language learning process in word databases.

Funder

Progetto Dottorati - Fondazione Cassa di Risparmio di Padova e Rovigo

neXt grant

STARS grant 2019 from University of Padova

University of Padova through “Excellence Project 2018” of the Cariparo foundation

H2020 European Research Council

Italian Ministry of Education, University and Research (MIUR), “Dipartimenti di Eccellenza”

Publisher

Public Library of Science (PLoS)

Subject

Multidisciplinary

Reference48 articles.

1. Species richness: estimation and comparison;A. Chao;Wiley StatsRef: Statistics Reference Online,2014

2. Remarks on the maximum entropy principle with application to the maximum entropy theory of ecology;M. Favretti;Entropy,2018

3. Maximum entropy theory of ecology: a reply to Harte;M. Favretti;Entropy,2018

4. The number of new species, and the increase in population coverage, when a sample is increased;I. Good;Biometrika,1956

5. Biodiversity scales from plots to biomes with a universal species–area curve;J. Harte;Ecology letters,2009