Evaluating Awkward Arrays, uproot, and coffea as a query platform for High Energy Physics Data-Reference-Cited by-同舟云学术

Evaluating Awkward Arrays, uproot, and coffea as a query platform for High Energy Physics Data

Published:2023-02-01 Issue:1 Volume:2438 Page:012033
ISSN:1742-6588
Container-title:Journal of Physics: Conference Series
language:
Short-container-title:J. Phys.: Conf. Ser.

Author:

Gray L.,Smith N.

Abstract

Abstract Query languages for High Energy Physics (HEP) are an ever present topic within the field. A query language that can efficiently represent the nested data structures that encode the statistical and physical meaning of HEP data will help analysts by ensuring their code is more clear and pertinent. As the result of a multi-year effort to develop an in-memory columnar representation of high energy physics data, the NumPy, Awkward Array, and uproot Python packages present a mature and efficient interface to HEP data. Atop that base, the coffea package adds functionality to launch queries at scale, manage and apply experiment-specific transformations to data, and present a rich object-oriented columnar data representation to the analyst. Recently, a set of Analysis Description Language (ADL) benchmarks has been established to compare HEP queries in multiple languages and frameworks. In this paper we present these benchmark queries implemented within the coffea framework and discuss their readability and performance characteristics. We find that the columnar queries perform as well or better than the implementations given in previous studies.

Publisher

IOP Publishing

Subject

Computer Science Applications,History,Education

Link

https://iopscience.iop.org/article/10.1088/1742-6596/2438/1/012033/pdf

Reference25 articles.

1. ROOT: An object oriented data analysis framework;Brun;Nucl. Instrum. Meth. A,1997

2. The numpy array: A structure for efficient numerical computation;Walt;Computing in Science Engineering,2011