Author:
Chakrabartty Satyendra Nath,Kangrui Wang,Chakrabarty Dalia
Abstract
Policy decisions are often motivated by results attained by a cohort of responders to a survey or a test. However, erroneous identification of the reliability or the complimentary uncertainty of the test/survey instrument, will distort the data that such policy decisions are based upon. Thus, robust learning of the uncertainty of such an instrument is sought. This uncertainty is parametrised by the departure from reproducibility of the data comprising responses to questions of this instrument, given the responders. Such departure is best modelled using the distance between the data on responses to questions that comprise the two similar subtests that the given test/survey can be split into. The paper presents three fast and robust ways for learning the optimal-subtests that a given test/survey instrument can be spilt into, to allow for reliable uncertainty of the given instrument, where the response to a question is either binary, or categorical − taking values at multiple levels − and the test/survey instrument is realistically heterogeneous in the correlation structure of the questions (or items); prone to measuring multiple traits; and built of small to a very large number of items. Our methods work in the presence of such messiness of real tests and surveys that typically violate applicability of conventional methods. We illustrate our new methods, by computing uncertainty of three real tests and surveys that are large to very-large in size, subsequent to learning the optimal subtests.