This paper introduces an innovative approach for the urban traffic flow prediction (TFP) that utilizes big data and deep learning (D-L) to improve accuracy, reducing the incidence of large errors commonplace in traditional methods. By implementing this method, sustainable urban developments are able to be achieved more effectively in the future. First, an Attention-CNN-GRU-ResNet (ACGR) TFP model is built with the D-L network by gridding the urban traffic flow (TF) into a three-dimensional S-T tensor sequence. An attention-based GRU is then introduced to combine spatial and channel attention in the traditional GRU, and the time dependence and spatio-temporal (S-T) heterogeneity of TF in each subset are effectively extracted. Finally, a ResNet module is introduced to capture the S-T dependency, which helps avoid the deep network degradation caused by excessive layers. Results show the proposed method generates the minimum value in RMSE, MAE, and MAPE with 18.32, 10.66, and 5.34, respectively. This research provides a new idea to alleviate data sparsity and consider the difference of input features and offers a novel approach to solve the S-T learning tasks associated with modeling.