Augmented Reality (AR) allows users to interact with the virtual world in the real-world environment. This paper proposes a child avatar simulation framework using multimodal data integration and user interaction in the AR environment. This framework generates the child avatar that can interact with the user and respond to his/her behaviors and consists of three subsystems: (1) the avatar interaction system scrutinizes user behaviors based on the user data, (2) the avatar action control system generates naturalistic avatar activities (actions and voices) according to the avatar internal status, and (3) the avatar display system renders the avatar through the AR interface. In addition, a child tantrum management training application has been built based on the proposed framework. And a light machine learning model has been integrated to enable efficient and effective speech emotion recognition. A sufficiently realistic child tantrum management training based on the evaluation of clinical child psychologists is enabled, which helps users get familiar with child tantrum management.