UGC Approved Journal no 63975(19)
New UGC Peer-Reviewed Rules

ISSN: 2349-5162 | ESTD Year : 2014
Volume 13 | Issue 3 | March 2026

JETIREXPLORE- Search Thousands of research papers



WhatsApp Contact
Click Here

Published in:

Volume 12 Issue 11
November-2025
eISSN: 2349-5162

UGC and ISSN approved 7.95 impact factor UGC Approved Journal no 63975

7.95 impact factor calculated by Google scholar

Unique Identifier

Published Paper ID:
JETIR2511649


Registration ID:
572380

Page Number

g46-g55

Share This Article


Jetir RMS

Title

Review of AI Image Captioning: A Multimodal Perspective Towards Healthcare, Accessibility, And Inclusive Applications

Abstract

Image captioning, an area that has combined computer vision and natural language processing, has advanced significantly in three key domains: real-world, medical, and captioning for the visually impaired. In real-world applications, foundational deep learning models like CNNs and RNNs evolved with attention mechanisms, reinforcement learning, and bidirectional LSTMs to enhance contextual accuracy and language fluency. In medical environments, captioning involves the interpretation of biomedical images using contextualized representations and multimodal techniques that help in automated diagnosis and clinical workflows. For visually impaired users, systems focus on accessibility, tackling real-world challenges, text comprehension that involves datasets like TextCaps, and enhancing multimodal inclusivity so that it expresses visual and textual content in scenes. The problem areas still persist, including handling imbalanced datasets, improving context understanding, and usability in special-purpose settings. The future areas of opportunity are refinements in domain-specific models, emergent techniques, such as transformer architectures, and improved accessibility for a wide spectrum of applications. By filling these gaps, image captioning can then further bridge visual data with actionable insights to all lines of applications from healthcare to assistive technologies while guiding future researchers towards developing more robust, inclusive, and context-aware systems.

Key Words

Image Captioning, Computer Vision and Natural Language Processing (CV-NLP Integration), Attention Mechanisms,, Multimodal Fusion (Visual-Textual-Audio), Contextual Understanding (Graph Neural Networks, Transformers)

Cite This Article

"Review of AI Image Captioning: A Multimodal Perspective Towards Healthcare, Accessibility, And Inclusive Applications", International Journal of Emerging Technologies and Innovative Research (www.jetir.org), ISSN:2349-5162, Vol.12, Issue 11, page no.g46-g55, November-2025, Available :http://www.jetir.org/papers/JETIR2511649.pdf

ISSN


2349-5162 | Impact Factor 7.95 Calculate by Google Scholar

An International Scholarly Open Access Journal, Peer-Reviewed, Refereed Journal Impact Factor 7.95 Calculate by Google Scholar and Semantic Scholar | AI-Powered Research Tool, Multidisciplinary, Monthly, Multilanguage Journal Indexing in All Major Database & Metadata, Citation Generator

Cite This Article

"Review of AI Image Captioning: A Multimodal Perspective Towards Healthcare, Accessibility, And Inclusive Applications", International Journal of Emerging Technologies and Innovative Research (www.jetir.org | UGC and issn Approved), ISSN:2349-5162, Vol.12, Issue 11, page no. ppg46-g55, November-2025, Available at : http://www.jetir.org/papers/JETIR2511649.pdf

Publication Details

Published Paper ID: JETIR2511649
Registration ID: 572380
Published In: Volume 12 | Issue 11 | Year November-2025
DOI (Digital Object Identifier):
Page No: g46-g55
Country: Sonipat, Haryana, India .
Area: Engineering
ISSN Number: 2349-5162
Publisher: IJ Publication


Preview This Article


Downlaod

Click here for Article Preview

Download PDF

Downloads

00025

Print This Page

Current Call For Paper

Jetir RMS