Efficient Reconstruction of Spatial Features for Remote Sensing Image-Text Retrieval
Article
Figures
Metrics
Preview PDF
Reference
Related
Cited by
Materials
Abstract:
Remote sensing cross-modal image-text retrieval (RSCIR) can flexibly and subjectively retrieve remote sensing images utilizing query text, which has received more researchers’ attention recently. However, with the increasing volume of visual-language pre-training model parameters, direct transfer learning consumes a substantial amount of computational and storage resources. Moreover, recently proposed parameter-efficient transfer learning methods mainly focus on the reconstruction of channel features, ignoring the spatial features which are vital for modeling key entity relationships. To address these issues, we design an efficient transfer learning framework for RSCIR, which is based on spatial feature efficient reconstruction (SPER). A concise and efficient spatial adapter is introduced to enhance the extraction of spatial relationships. The spatial adapter is able to spatially reconstruct the features in the backbone with few parameters while incorporating the prior information from the channel dimension. We conduct quantitative and qualitative experiments on two different commonly used RSCIR datasets. Compared with traditional methods, our approach achieves an improvement of 3%—11% in sumR metric. Compared with methods finetuning all parameters, our proposed method only trains less than 1% of the parameters, while maintaining an overall performance of about 96%. The relevant code and files are released at https://github.com/AICyberTeam/SPER.
Keywords:
Project Supported:
This work was supported by the National Key R&D Program of China (No.2022ZD0118402).
ZHANG Weihang, CHEN Jialiang, ZHANG Wenkai, LI Xinming, GAO Xin, SUN Xian. Efficient Reconstruction of Spatial Features for Remote Sensing Image-Text Retrieval[J]. Transactions of Nanjing University of Aeronautics & Astronautics,2025,(1):101-111