Video-based human action recognition has become one of the research hotspots in the field of computer vision in recent years and has been widely used in the fields of intelligent human-computer interaction and virtual reality. However, most of the current existing methods and public datasets are constructed for human daily activities, and the assessment of basketball skills is still a challenging problem. In order to solve the above issues, in this paper, the authors propose a coarse-to-fine video-based metric learning framework for basketball skills assessment. Specifically, they first use a variety of models to jointly represent the action video, and then the optimal distance metric between videos is learned based on the representation. Finally, based on the distance metric, a query video is coarsely classified to obtain the corresponding label of video action, and then the video is finely classified to judge whether the action is standardized. The experiments on a collected dataset show that the proposed framework can better identify and assess the non-standard actions of basketball.