UGC Approved Journal no 63975(19)
New UGC Peer-Reviewed Rules

ISSN: 2349-5162 | ESTD Year : 2014
Volume 12 | Issue 10 | October 2025

JETIREXPLORE- Search Thousands of research papers



WhatsApp Contact
Click Here

Published in:

Volume 12 Issue 7
July-2025
eISSN: 2349-5162

UGC and ISSN approved 7.95 impact factor UGC Approved Journal no 63975

7.95 impact factor calculated by Google scholar

Unique Identifier

Published Paper ID:
JETIR2507513


Registration ID:
566725

Page Number

f110-f118

Share This Article


Jetir RMS

Title

Multimodal Emotion Recognition using Vision, Text, and Audio with Transformer Models for Real-Time Video Call Applications

Authors

Abstract

The proliferation of remote communication has created an urgent need for intelligent systems capable of understanding human emotions through multimodal inputs in video conferencing environments. Current emotion recognition systems predominantly rely on single modalities, limiting their effectiveness in capturing the complex nature of human emotional expression. This research proposes a novel Adaptive Cross-Modal Transformer Fusion Network (ACMTFN) that integrates vision, text, and audio modalities using advanced transformer architectures for real-time emotion recognition in video calls. Our approach employs a hierarchical transformer-based architecture with dedicated encoders for each modality: a fine-tuned Vision Transformer (ViT) for facial expressions, BERT for textual content analysis, and Wav2Vec2 for audio processing. The core innovation lies in our adaptive cross-modal attention mechanism that dynamically weights inter-modal relationships based on contextual relevance, addressing the critical challenge of modality imbalance in multimodal learning. Comprehensive evaluation on the MELD and CMU-MOSEI benchmark datasets demonstrates superior performance, achieving 76.8% accuracy on MELD and 79.4% F1-score on CMU-MOSEI, representing improvements of 4.2% and 3.8% respectively over existing state-of-the-art methods. Crucially, the system maintains computational efficiency suitable for real-time applications with inference times averaging 85ms per sample, meeting the stringent latency requirements for video conferencing platforms. The proposed ACMTFN contributes to human-AI interaction by providing a practical solution for emotion-aware computing systems with immediate applications in virtual meetings, mental health monitoring, and educational technology. This work advances the field of affective computing by demonstrating how transformer-based multimodal fusion can effectively bridge the emotional gap in digital communication.

Key Words

Multimodal emotion recognition, transformer models, cross-modal attention, human-computer interaction, video conferencing, affective computing

Cite This Article

"Multimodal Emotion Recognition using Vision, Text, and Audio with Transformer Models for Real-Time Video Call Applications", International Journal of Emerging Technologies and Innovative Research (www.jetir.org), ISSN:2349-5162, Vol.12, Issue 7, page no.f110-f118, July-2025, Available :http://www.jetir.org/papers/JETIR2507513.pdf

ISSN


2349-5162 | Impact Factor 7.95 Calculate by Google Scholar

An International Scholarly Open Access Journal, Peer-Reviewed, Refereed Journal Impact Factor 7.95 Calculate by Google Scholar and Semantic Scholar | AI-Powered Research Tool, Multidisciplinary, Monthly, Multilanguage Journal Indexing in All Major Database & Metadata, Citation Generator

Cite This Article

"Multimodal Emotion Recognition using Vision, Text, and Audio with Transformer Models for Real-Time Video Call Applications", International Journal of Emerging Technologies and Innovative Research (www.jetir.org | UGC and issn Approved), ISSN:2349-5162, Vol.12, Issue 7, page no. ppf110-f118, July-2025, Available at : http://www.jetir.org/papers/JETIR2507513.pdf

Publication Details

Published Paper ID: JETIR2507513
Registration ID: 566725
Published In: Volume 12 | Issue 7 | Year July-2025
DOI (Digital Object Identifier):
Page No: f110-f118
Country: panvel,raigad, maharashtra, India .
Area: Engineering
ISSN Number: 2349-5162
Publisher: IJ Publication


Preview This Article


Downlaod

Click here for Article Preview

Download PDF

Downloads

00078

Print This Page

Current Call For Paper

Jetir RMS