Boost recall in quasi-stellar object selection from highly imbalanced photometric datasets. The reverse selection method

Author:

Calderone Giorgio,Guarneri Francesco,Porru Matteo,Cristiani Stefano,Grazian Andrea,Nicastro Luciano,Bischetti Manuela,Boutsia Konstantina,Cupani Guido,D'Odorico Valentina,Feruglio Chiara,Fontanot Fabio

Abstract

The identification of bright quasi-stellar objects (QSOs) is of fundamental importance to probe the intergalactic medium and address open questions in cosmology. Several approaches have been adopted to find such sources in the currently available photometric surveys, including machine learning methods. However, the rarity of bright QSOs at high redshifts compared to other contaminating sources (such as stars and galaxies) makes the selection of reliable candidates a difficult task, especially when high completeness is required. We present a novel technique to boost recall (i.e., completeness within the considered sample) in the selection of QSOs from photometric datasets dominated by stars, galaxies, and low-$z$ QSOs (imbalanced datasets). Our heuristic method operates by iteratively removing sources whose probability of belonging to a noninteresting class exceeds a user-defined threshold, until the remaining dataset contains mainly high-$z$ QSOs. Any existing machine learning method can be used as the underlying classifier, provided it allows for a classification probability to be estimated. We applied the method to a dataset obtained by cross-matching PanSTARRS1 (DR2), Gaia (DR3), and WISE, and identified the high-$z$ QSO candidates using both our method and its direct multi-label counterpart. We ran several tests by randomly choosing the training and test datasets, and achieved significant improvements in recall which increased from sim 50<!PCT!> to sim 85<!PCT!> for QSOs with $z>2.5$, and from sim 70<!PCT!> to sim 90<!PCT!> for QSOs with $z>3$. Also, we identified a sample of 3098 new QSO candidates on a sample of 2.6 $ 10^6$ sources with no known classification. We obtained follow-up spectroscopy for 121 candidates, confirming 107 new QSOs with $z > 2.5$. Finally, a comparison of our QSO candidates with those selected by an independent method based on GAIA spectroscopy shows that the two samples overlap by more than 90<!PCT!> and that both selection methods are potentially capable of achieving a high level of completeness.

Publisher

EDP Sciences

Subject

Space and Planetary Science,Astronomy and Astrophysics

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3