Automatic Image Captioning Using Different Variants of the Long Short-Term Memory (LSTM) Deep Learning Model

View Sample PDF

Author(s): Ritwik Kundu (Vellore Institute of Technology, Vellore, India), Shaurya Singh (Vellore Institute of Technology, Vellore, India), Geraldine Amali (Vellore Institute of Technology, Vellore, India), Mathew Mithra Noel (Vellore Institute of Technology, Vellore, India)and Umadevi K. S. (Vellore Institute of Technology, Vellore, India)
Copyright: 2023
Pages: 24
Source title: Deep Learning Research Applications for Natural Language Processing
Source Author(s)/Editor(s): L. Ashok Kumar (PSG College of Technology, India), Dhanaraj Karthika Renuka (PSG College of Technology, India)and S. Geetha (Vellore Institute of Technology, India)
DOI: 10.4018/978-1-6684-6001-6.ch008

Keywords: Artificial Intelligence / Engineering Science Reference / Knowledge Management and Library Science / Natural Language Processing

Purchase

View Automatic Image Captioning Using Different Variants of the Long Short-Term Memory (LSTM) Deep Learning Model on the publisher's website for pricing and purchasing information.

Abstract

Today's world is full of digital images; however, the context is unavailable most of the time. Thus, image captioning is quintessential for providing the content of an image. Besides generating accurate captions, the image captioning model must also be scalable. In this chapter, two variants of long short-term memory (LSTM), namely stacked LSTM and BiLSTM along with convolutional neural networks (CNN) have been used to implement the Encoder-Decoder model for generating captions. Bilingual evaluation understudy (BLEU) score metric is used to evaluate the performance of these two bi-layered models. From the study, it was observed that both the models were on par when it came to performance. Some resulted in low BLEU scores suggesting that the predicted caption was dissimilar to the actual caption whereas some very high BLEU scores suggested that the model was able to predict captions almost similar to human. Furthermore, it was found that the bidirectional LSTM model is more computationally intensive and requires more time to train than the stacked LSTM model owing to its complex architecture.

The IRMA Community

Research IRM

Automatic Image Captioning Using Different Variants of the Long Short-Term Memory (LSTM) Deep Learning Model

Purchase

Abstract

Related Content

IRMA Sponsors