Training from Zero: Forecasting of Radio Frequency Machine Learning Data Quantity-Reference-Cited by-同舟云学术

Training from Zero: Forecasting of Radio Frequency Machine Learning Data Quantity

Published:2024-07-18 Issue:3 Volume:5 Page:632-651
ISSN:2673-4001
Container-title:Telecom
language:en
Short-container-title:Telecom

Author:

Clark William H.¹^ORCID,Michaels Alan J.¹^ORCID

Affiliation:

1. Virginia Tech National Security Institute, Blacksburg, VA 24060, USA

Abstract

The data used during training in any given application space are directly tied to the performance of the system once deployed. While there are many other factors that are attributed to producing high-performance models based on the Neural Scaling Law within Machine Learning, there is no doubt that the data used to train a system provide the foundation from which to build. One of the underlying heuristics used within the Machine Learning space is that having more data leads to better models, but there is no easy answer to the question, “How much data is needed to achieve the desired level of performance?” This work examines a modulation classification problem in the Radio Frequency domain space, attempting to answer the question of how many training data are required to achieve a desired level of performance, but the procedure readily applies to classification problems across modalities. The ultimate goal is to determine an approach that requires the lowest amount of data collection to better inform a more thorough collection effort to achieve the desired performance metric. By focusing on forecasting the performance of the model rather than the loss value, this approach allows for a greater intuitive understanding of data volume requirements. While this approach will require an initial dataset, the goal is to allow for the initial data collection to be orders of magnitude smaller than what is required for delivering a system that achieves the desired performance. An additional benefit of the techniques presented here is that the quality of different datasets can be numerically evaluated and tied together with the quantity of data, and ultimately, the performance of the architecture in the problem domain.

Funder

Office of the Director of National Intelligence

Intelligence Advanced Research Projects Activity

Publisher

MDPI AG

Link

https://www.mdpi.com/2673-4001/5/3/32/pdf

Reference46 articles.

1. Oxford University Press (2012). Machine Learning. Oxford English Dictionary, Oxford University Press. [3rd ed.].

2. Sanders, H., and Saxe, J. (2017). Garbage In, Garbage Out: How Purportedly Great ML Models can be Screwed up by Bad Data, Black Hat. Technical Report.

3. O’Shea, T., and West, N. (2016, January 6). Radio Machine Learning Dataset Generation with GNU Radio. Proceedings of the GNU Radio Conference, Boulder, CO, USA.

4. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., and Devin, M. (2020, January 07). TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Available online: https://tensorflow.org.

5. Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., and Garnett, R. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. Advances in Neural Information Processing Systems 32, Curran Associates, Inc.