Multi-modal Fine-grained Retrieval with Local and Global Cross-Attention

Qiaosong Chen*, Ye Zhang, Junzhuo Liu, Zhixiang Wang, Xin Deng, Jin Wang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

Abstract

The goal of cross-modal retrieval is that the user gives any sample as a query sample, and the system retrieves and feeds back various modal samples related to the query sample. At present, the cross-modal retrieval method mainly focuses on coarse-grained, which is far from being satisfied in practical application. However, there are many difficulties in fine-grained retrieval, such as the heterogeneous gap and semantic gap between multi-modal data, the difficulty of similarity measurement, and the small difference in fine-grained sample features. To overcome these limitations, we propose a novel multi-modal fine-grained retrieval method with the LAGC-Attention module, which can fully extract and fuse feature information from different modalities and represent them in a common space. Specifically, we use local and global cross self-attention to extract the neighboring and global context information for each single modal data, which greatly enhances the feature representation capability of each modality (image, text, audio, video), and especially reduce the gap between different feature distributions. Finally, Extensive experiments and ablation studies demonstrate that our method achieves state-of-the-art on the public dataset PKU FG-XMedia.
Original languageEnglish
Title of host publicationICUFN 2023 - 14th International Conference on Ubiquitous and Future Networks
PublisherIEEE Xplore
Pages1-7
Number of pages7
Edition1
ISBN (Electronic)9798350335385
DOIs
Publication statusPublished - 1 Jan 2023
Event14th International Conference on Ubiquitous and Future Networks - Paris, France
Duration: 4 Jul 20237 Jul 2023
Conference number: 14

Publication series

SeriesInternational Conference on Ubiquitous and Future Networks, ICUFN
Volume2023-July
ISSN2165-8528

Conference

Conference14th International Conference on Ubiquitous and Future Networks
Abbreviated titleICUFN 2023
Country/TerritoryFrance
CityParis
Period4/07/237/07/23

Keywords

  • Cross-media fine-grained retrieval
  • Heterogeneity gap
  • Local and global cross-attention

Cite this