Affiliation:
1. Division of Biostatistics German Cancer Research Center (DKFZ) Heidelberg Germany
2. Department of Statistics and Actuarial Science University of Waterloo Waterloo Ontario Canada
Abstract
Statistical data simulation is essential in the development of statistical models and methods as well as in their performance evaluation. To capture complex data structures, in particular for high‐dimensional data, a variety of simulation approaches have been introduced including parametric and the so‐called plasmode simulations. While there are concerns about the realism of parametrically simulated data, it is widely claimed that plasmodes come very close to reality with some aspects of the “truth” known. However, there are no explicit guidelines or state‐of‐the‐art on how to perform plasmode data simulations. In the present paper, we first review existing literature and introduce the concept of statistical plasmode simulation. We then discuss advantages and challenges of statistical plasmodes and provide a step‐wise procedure for their generation, including key steps to their implementation and reporting. Finally, we illustrate the concept of statistical plasmodes as well as the proposed plasmode generation procedure by means of a public real RNA data set on breast carcinoma patients.