The existing Industrial Internet of Things (IIoT) temporal data analysis methods often suffer from issues such as information loss, difficulty balancing spatial and temporal features, and being affected by training data noise, which can lead to varying degrees of reduced model accuracy. Therefore, a new anomaly detection method was proposed, which integrated Transformer and adversarial training. Firstly, a bidirectional spatiotemporal feature extraction module was constructed by combining Graph Attention Networks (GAT) and Bidirectional Gated Recurrent Unit (BiGRU), which can simultaneously extract spatial and temporal features. Then, by combining multi-scale convolution with Long Short-Term Memory (LSTM), multi-scale contextual information was captured. Finally, an improved Transformer was used to fuse multi-dimensional features, combined with an adversarial-trained variational autoencoder to calculate the anomalies of the input data. This method outperforms other comparison models by conducting experiments on four publicly available datasets.