UGC Approved Journal no 63975(19)
New UGC Peer-Reviewed Rules

ISSN: 2349-5162 | ESTD Year : 2014
Volume 12 | Issue 10 | October 2025

JETIREXPLORE- Search Thousands of research papers



WhatsApp Contact
Click Here

Published in:

Volume 12 Issue 3
March-2025
eISSN: 2349-5162

UGC and ISSN approved 7.95 impact factor UGC Approved Journal no 63975

7.95 impact factor calculated by Google scholar

Unique Identifier

Published Paper ID:
JETIR2503210


Registration ID:
556553

Page Number

c77-c84

Share This Article


Jetir RMS

Title

Speech to Text Conversion

Abstract

Speech-to-text (STT) conversion, also known as automatic speech recognition (ASR), refers to the process of converting spoken language into written text. This technology has gained significant attention in recent years due to its potential applications across various domains, including healthcare, customer service, education, and accessibility. The primary goal of STT systems is to accurately transcribe human speech in real-time or from recorded audio data while handling variations in accents, speech patterns, background noise, and other real-world challenges. The STT conversion process typically involves several stages, including audio signal preprocessing, feature extraction, pattern recognition, and text generation. Initially, the input audio is captured using microphones or recording devices, followed by the extraction of relevant features such as Mel-frequency cepstral coefficients (MFCCs) to represent the speech signal. These features are then processed using machine learning algorithms—ranging from traditional hidden Markov models (HMMs) to more advanced deep learning models such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), including Long Short-Term Memory (LSTM) networks. These models are trained on large datasets to recognize and predict phonetic, linguistic, and contextual elements of speech. Key challenges faced by speech-to-text systems include handling homophones, speech disfluencies, speaker variability, noise interference, and real-time processing demands. To improve accuracy and robustness, STT systems often incorporate techniques such as noise filtering, speaker adaptation, language modeling, and context-based corrections.

Key Words

Cite This Article

"Speech to Text Conversion", International Journal of Emerging Technologies and Innovative Research (www.jetir.org), ISSN:2349-5162, Vol.12, Issue 3, page no.c77-c84, March-2025, Available :http://www.jetir.org/papers/JETIR2503210.pdf

ISSN


2349-5162 | Impact Factor 7.95 Calculate by Google Scholar

An International Scholarly Open Access Journal, Peer-Reviewed, Refereed Journal Impact Factor 7.95 Calculate by Google Scholar and Semantic Scholar | AI-Powered Research Tool, Multidisciplinary, Monthly, Multilanguage Journal Indexing in All Major Database & Metadata, Citation Generator

Cite This Article

"Speech to Text Conversion", International Journal of Emerging Technologies and Innovative Research (www.jetir.org | UGC and issn Approved), ISSN:2349-5162, Vol.12, Issue 3, page no. ppc77-c84, March-2025, Available at : http://www.jetir.org/papers/JETIR2503210.pdf

Publication Details

Published Paper ID: JETIR2503210
Registration ID: 556553
Published In: Volume 12 | Issue 3 | Year March-2025
DOI (Digital Object Identifier):
Page No: c77-c84
Country: Kolhapur, Maharashtra, India .
Area: Engineering
ISSN Number: 2349-5162
Publisher: IJ Publication


Preview This Article


Downlaod

Click here for Article Preview

Download PDF

Downloads

00088

Print This Page

Current Call For Paper

Jetir RMS