UGC Approved Journal no 63975(19)

ISSN: 2349-5162 | ESTD Year : 2014
Call for Paper
Volume 11 | Issue 3 | March 2024

JETIREXPLORE- Search Thousands of research papers



WhatsApp Contact
Click Here

Published in:

Volume 8 Issue 1
January-2021
eISSN: 2349-5162

UGC and ISSN approved 7.95 impact factor UGC Approved Journal no 63975

7.95 impact factor calculated by Google scholar

Unique Identifier

Published Paper ID:
JETIR2101113


Registration ID:
305043

Page Number

842-855

Share This Article


Jetir RMS

Title

Create caption by extracting features from image and video using deep learning model

Authors

Abstract

The images, videos captured using various devices and images available in various sources like internet, news articles, social media does not have proper description through which a human can understand those without observing closely. It is also very difficult to write syntactically correct sentence describing the images for a large set of images and videos manually. Sometimes the description may vary based on each individual person’s perception, mood and interpretation at the time of interpreting the image in the form of sentence. This may lead to forming a caption with inaccurate, imperfect, error prone description for the images or videos which is not acceptable in some scenarios where the accuracy is the primary criteria for the other systems to react or act upon. The aim here is to generate a sentence for an image by identifying the features by using deep learning techniques and also for the video frames to generate sentence using the same model used for the generation of image captioning. The feature extraction from the images is done by using pre-trained deep learning model available like VGG16, Densenet121, InceptionV3 etc. The vocabulary was built by processing the description of the images that is available as a part of the dataset and removing the stop words. The deep learning model is built and trained to generate syntactically correct sentence by looking into the extracted features and predicting the next set of sequence of words and interpreting those sequence of word into sentence. The experiments done on several datasets on the model and measured the accuracy of the model by measuring the fluency of the language it learns solely from image descriptions. The accuracy of model and smoothness or command of language model learns from image training on various sources of data from Flickr8k, Flickr30k, Video frames etc. BLEU score is generated to compare the model performance. To extract the feature from the images, deep learning neural network technique convolution neuron network (CNN) is used and recurrent neural network (RNN), such as a Long Short-Term Memory network (LSTM) is used to generate the sequence of word. The generation of cation for the video, key-frames are extracted from the video by passing through the key frame extraction framework built inside the application. The sequence of key-frames extracted from video is fed into the same image captioning model that was trained for generating the caption for the images. The captions are generated from the frames extracted from the video by using the predefined pre-trained model by feeding the images.

Key Words

Video Image Caption, Image Caption using Deep Learning Techniques, Image Segmentation, Key Frame Extraction

Cite This Article

"Create caption by extracting features from image and video using deep learning model", International Journal of Emerging Technologies and Innovative Research (www.jetir.org), ISSN:2349-5162, Vol.8, Issue 1, page no.842-855, January-2021, Available :http://www.jetir.org/papers/JETIR2101113.pdf

ISSN


2349-5162 | Impact Factor 7.95 Calculate by Google Scholar

An International Scholarly Open Access Journal, Peer-Reviewed, Refereed Journal Impact Factor 7.95 Calculate by Google Scholar and Semantic Scholar | AI-Powered Research Tool, Multidisciplinary, Monthly, Multilanguage Journal Indexing in All Major Database & Metadata, Citation Generator

Cite This Article

"Create caption by extracting features from image and video using deep learning model", International Journal of Emerging Technologies and Innovative Research (www.jetir.org | UGC and issn Approved), ISSN:2349-5162, Vol.8, Issue 1, page no. pp842-855, January-2021, Available at : http://www.jetir.org/papers/JETIR2101113.pdf

Publication Details

Published Paper ID: JETIR2101113
Registration ID: 305043
Published In: Volume 8 | Issue 1 | Year January-2021
DOI (Digital Object Identifier): http://doi.one/10.1729/Journal.25477
Page No: 842-855
Country: Kolkata, West Bengal, India .
Area: Engineering
ISSN Number: 2349-5162
Publisher: IJ Publication


Preview This Article


Downlaod

Click here for Article Preview

Download PDF

Downloads

0003053

Print This Page

Current Call For Paper

Jetir RMS