Affiliation:
1. Institute of Statistics and AI Center RWTH Aachen University Aachen Germany
2. Faculty of Information and Telecomunication Technology Wrocław University of Sciences and Technology Wrocław Poland
Abstract
Having in mind applications in statistics and machine learning such as individualized care monitoring, or watermark detection in large language models, we consider the following general setting: When monitoring a sequence of observations, , there may be additional information, , on the environment which should be used to design the monitoring procedure. This additional information can be incorporated by applying threshold functions to the standardized measurements to adapt the detector to the environment. For the case of categorical data encoding of discrete‐valued environmental information we study several classes of level threshold functions including a proportional one which favors rare events among imbalanced classes. For the latter rule asymptotic theory is developed for independent and identically distributed and dependent learning samples including data from new discrete autoregressive moving average model (NDARMA) series and Hidden Markov Models. Further, we propose two‐stage designs which allow to distribute in a controlled way the budget over an a priori partition of the sample space of . The approach is illustrated by a real medical data set.
Reference21 articles.
1. Classification with class imbalance problem;Ali A.;International Journal of Advances in Soft Computing and Its Applications,2013
2. Adaptive Thresholding and Automatic Detection
3. Convergence of Probability Measures