An Image-Text Matching Method for Multi-Modal Robots

View Sample PDF

Author(s): Ke Zheng (Hunan Biological and Electromechanical Polytechnic, China)and Zhou Li (Hunan Biological and Electromechanical Polytechnic, China)
Copyright: 2024
Volume: 36
Issue: 1
Pages: 21
Source title: Journal of Organizational and End User Computing (JOEUC)
Editor(s)-in-Chief: Sangbing (Jason) Tsai (Wuyi University, China & International Engineering and Technology Institute (IETI), Hong Kong)and Wei Liu (Qingdao University, China)
DOI: 10.4018/JOEUC.334701

Keywords: Computer Science & IT / End-User Computing / Engineering Science Reference / Human-Computer Interaction

Purchase

View on the publisher's website for pricing and purchasing information.

Abstract

With the rapid development of artificial intelligence and deep learning, image-text matching has gradually become an important research topic in cross-modal fields. Achieving correct image-text matching requires a strong understanding of the correspondence between visual and textual information. In recent years, deep learning-based image-text matching methods have achieved significant success. However, image-text matching requires a deep understanding of intra-modal information and the exploration of fine-grained alignment between image regions and textual words. How to integrate these two aspects into a single model remains a challenge. Additionally, reducing the internal complexity of the model and effectively constructing and utilizing prior knowledge are also areas worth exploring, therefore addressing the issues of excessive computational complexity in existing fine-grained matching methods and the lack of multi-perspective matching.

The IRMA Community

Research IRM

An Image-Text Matching Method for Multi-Modal Robots

Purchase

Abstract

Related Content

IRMA Sponsors