UGC Approved Journal no 63975(19)
New UGC Peer-Reviewed Rules

ISSN: 2349-5162 | ESTD Year : 2014
Volume 12 | Issue 10 | October 2025

JETIREXPLORE- Search Thousands of research papers



WhatsApp Contact
Click Here

Published in:

Volume 12 Issue 5
May-2025
eISSN: 2349-5162

UGC and ISSN approved 7.95 impact factor UGC Approved Journal no 63975

7.95 impact factor calculated by Google scholar

Unique Identifier

Published Paper ID:
JETIR2505791


Registration ID:
562783

Page Number

g891-g894

Share This Article


Jetir RMS

Title

A review on Speech Emotion Recognition from Raw Audio using LSTM and Neural Networks

Abstract

Speech Emotion Recognition (SER) has emerged as a crucial domain within human-computer interaction (HCI), enabling machines to identify and respond to users' emotional states. Unlike traditional text-based sentiment analysis, SER relies on auditory cues, making it more complex due to the dynamic and nuanced nature of speech. With the proliferation of deep learning, especially architectures like Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNNs), significant strides have been made in extracting emotional patterns from raw audio data. This review paper delves into recent advancements in SER, focusing primarily on methodologies that bypass extensive feature engineering by utilizing raw waveform data. We comprehensively analyze ten state-of-the-art studies that have contributed novel techniques, including attention mechanisms, hybrid CNN-LSTM models, and end-to-end learning paradigms. Each method is evaluated based on the dataset used, performance metrics (accuracy, F1-score, etc.), and its limitations. A key insight from the review is the increasing reliance on raw audio inputs, eliminating the dependency on handcrafted features such as MFCC or spectrograms. However, existing approaches still struggle with generalization, dataset imbalance, and speaker variability. The paper also presents a brief overview of a proposed LSTM-based architecture designed to enhance robustness across diverse speech signals. Our findings highlight the gaps in current research and suggest directions for future exploration, particularly emphasizing multilingual datasets and unsupervised learning techniques for SER. Keywords Speech Emotion Recognition (SER), Raw Audio, LSTM, Deep Learning, CNN, Attention Mechanism, Human-Computer Interaction, Emotion Detection, End-to-End Learning, Neural Networks

Key Words

Speech Emotion Recognition (SER), Raw Audio, LSTM, Deep Learning, CNN, Attention Mechanism, Human-Computer Interaction, Emotion Detection, End-to-End Learning, Neural Networks

Cite This Article

"A review on Speech Emotion Recognition from Raw Audio using LSTM and Neural Networks", International Journal of Emerging Technologies and Innovative Research (www.jetir.org), ISSN:2349-5162, Vol.12, Issue 5, page no.g891-g894, May-2025, Available :http://www.jetir.org/papers/JETIR2505791.pdf

ISSN


2349-5162 | Impact Factor 7.95 Calculate by Google Scholar

An International Scholarly Open Access Journal, Peer-Reviewed, Refereed Journal Impact Factor 7.95 Calculate by Google Scholar and Semantic Scholar | AI-Powered Research Tool, Multidisciplinary, Monthly, Multilanguage Journal Indexing in All Major Database & Metadata, Citation Generator

Cite This Article

"A review on Speech Emotion Recognition from Raw Audio using LSTM and Neural Networks", International Journal of Emerging Technologies and Innovative Research (www.jetir.org | UGC and issn Approved), ISSN:2349-5162, Vol.12, Issue 5, page no. ppg891-g894, May-2025, Available at : http://www.jetir.org/papers/JETIR2505791.pdf

Publication Details

Published Paper ID: JETIR2505791
Registration ID: 562783
Published In: Volume 12 | Issue 5 | Year May-2025
DOI (Digital Object Identifier):
Page No: g891-g894
Country: Bhopal, MP, India .
Area: Engineering
ISSN Number: 2349-5162
Publisher: IJ Publication


Preview This Article


Downlaod

Click here for Article Preview

Download PDF

Downloads

000108

Print This Page

Current Call For Paper

Jetir RMS