IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Automatic Image Captioning Using Different Variants of the Long Short-Term Memory (LSTM) Deep Learning Model

Automatic Image Captioning Using Different Variants of the Long Short-Term Memory (LSTM) Deep Learning Model
View Sample PDF
Author(s): Ritwik Kundu (Vellore Institute of Technology, Vellore, India), Shaurya Singh (Vellore Institute of Technology, Vellore, India), Geraldine Amali (Vellore Institute of Technology, Vellore, India), Mathew Mithra Noel (Vellore Institute of Technology, Vellore, India)and Umadevi K. S. (Vellore Institute of Technology, Vellore, India)
Copyright: 2023
Pages: 24
Source title: Deep Learning Research Applications for Natural Language Processing
Source Author(s)/Editor(s): L. Ashok Kumar (PSG College of Technology, India), Dhanaraj Karthika Renuka (PSG College of Technology, India)and S. Geetha (Vellore Institute of Technology, India)
DOI: 10.4018/978-1-6684-6001-6.ch008

Purchase

View Automatic Image Captioning Using Different Variants of the Long Short-Term Memory (LSTM) Deep Learning Model on the publisher's website for pricing and purchasing information.

Abstract

Today's world is full of digital images; however, the context is unavailable most of the time. Thus, image captioning is quintessential for providing the content of an image. Besides generating accurate captions, the image captioning model must also be scalable. In this chapter, two variants of long short-term memory (LSTM), namely stacked LSTM and BiLSTM along with convolutional neural networks (CNN) have been used to implement the Encoder-Decoder model for generating captions. Bilingual evaluation understudy (BLEU) score metric is used to evaluate the performance of these two bi-layered models. From the study, it was observed that both the models were on par when it came to performance. Some resulted in low BLEU scores suggesting that the predicted caption was dissimilar to the actual caption whereas some very high BLEU scores suggested that the model was able to predict captions almost similar to human. Furthermore, it was found that the bidirectional LSTM model is more computationally intensive and requires more time to train than the stacked LSTM model owing to its complex architecture.

Related Content

Wasswa Shafik. © 2024. 25 pages.
Muthmainnah Muthmainnah, Eka Apriani, Prodhan Mahbub Ibna Seraj, Ahmed J. Obaid, Ahmad M. Al Yakin. © 2024. 17 pages.
Arkar Htet, Sui Reng Liana, Theingi Aung, Amiya Bhaumik. © 2024. 26 pages.
Shwetha Baliga, Harshith K. Murthy, Apoorv Sadhale, Dhruti Upadhyaya. © 2024. 18 pages.
Manoj Kumar Pandey, Jyoti Upadhyay. © 2024. 21 pages.
R. Angeline, S. Aarthi, Rishabh Jain, Muzamil Faisal, Abishek Venkatesan, R. Regin. © 2024. 16 pages.
Gagan Deep, Jyoti Verma. © 2024. 20 pages.
Body Bottom