To address the problems of existing passenger flow prediction methods such as low accuracy, inadequate learning of spatial features of station topology, and inability to apply to large networks, a SAE-GCN-BiLSTM-based passenger flow forecasting method for urban rail transit is proposed. First, the external features are extracted layer by layer using stacked autoencoder (SAE). Then, graph convolutional network (GCN) is used to capture the spatial features of station topology, and bi-directional long and short-term memory network (BiLSTM) is used to extract the bi-directional temporal features, realizing the extraction of the spatio-temporal features. Finally, external features and spatio-temporal features are fused for accurate prediction of urban rail transit passenger flow. The experimental results show that the proposed method is higher than several other advanced models in the evaluation indexes under different granularities, indicating that the model effectively develops the accuracy and robustness of urban rail transit passenger flow prediction.