Abstract:
Vibration spectrum extraction is essential for fault diagnosis of rotating machinery. Environmental diversification and the presence of noise limit the performance of traditional single-modal vibration extraction methods. Since visual and audio signals have different sampling frequencies, noise and environmental constraints, audio-visual fusion can effectively solve the problem caused by single modality. Based on this, this paper proposes a wideband spectrum extraction method based on an audio-visual fusion deep convolutional neural network, which fully fuses the effective information of different modalities to complement each other. The proposed model uses a dual-stream encoder to extract features from different modalities, and a deep residual fusion module extracts high-level fusion features and feeds them to the decoder. The experimental results show that the performance of this model is superior to the latest vibration extraction methods, and the proposed model outperforms other state-of-the-art models such as RegNet, MFCNN, and L2L, which improves the accuracy of vibration spectrum extraction by 15% in noisy environment.