Abstract
In this work, a statistical method for the segmentation of samples and/or populations is presented, which is based on a ternary tree structure. This approach overcomes known limitations of other segmentation methods such as CHAID, concerning the multivariate response and the non-symmetric relationship between explanatory and response variables. The multivariate response segmentation problem is handled through latent class models, while the factorial decomposition of the explanatory capability of variables is based on the Non-Symmetrical Correspondence Analysis. Stop criteria based on the CATANOVA index and impurity measures are proposed. A Simulated Annealing based post-pruning strategy is considered to avoid over-fitting relative to the training set and guarantee a better generalization capability for the method.
Subject
General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)