JETIREXPLORE- Search Thousands of research papers



Published in:

Volume 7 Issue 6
June-2020
eISSN: 2349-5162

Unique Identifier

JETIR2006189

Page Number

1306-1312

Share This Article


Title

REAL TIME VOICE CLONING

ISSN

2349-5162

Cite This Article

"REAL TIME VOICE CLONING", International Journal of Emerging Technologies and Innovative Research (www.jetir.org), ISSN:2349-5162, Vol.7, Issue 6, page no.1306-1312, June-2020, Available :http://www.jetir.org/papers/JETIR2006189.pdf

Abstract

Recent progress in deep learning has shown impressive results in the area of speech-to-text. For this reason, a deep neural network is usually trained from a single speaker using a corpus of several hours of voice recorded professionally. Giving such a model a new voice is highly expensive, as it needs a new dataset to be collected and the model retrained. A recent research has developed a three-stage pipeline that allows you to clone an unseen voice from just a few seconds of reference speech during practice and without retraining the template. The researchers share strikingly natural-sounding findings. The plan is to replicate this model and open source it to the public. With a new vocoder model, the aim is to adapt the framework to make it run in real time. The aim is to develop a three-stage deep learning system that will perform real-time voice cloning. This framework is the result of Google's 2018 paper, for which only one public implementation exists before ours. The system could capture a realistic representation of the voice spoken in a digital format from a speech utterance of only 5 seconds. Because of a text prompt, it can use any voice extracted from this process to perform text-to-speech. With our own implementations or open-source ones then plan is to replicate each of the three stages of the model. The plan is to implement successful models of deep learning and appropriate pipelines for pre-processing information. The next step is training these models from several thousand speakers for weeks or months on large datasets of tens of thousands of hours of speech. Instead examine their strengths and their drawbacks. The main focus on making this system function in real time, that is, to allow a voice to be captured and speech to be generated in less time than the duration of the speech produced. The framework will be able to clone voices it has never heard during training, and to generate speech from text it has never seen.

Key Words

Cite This Article

"REAL TIME VOICE CLONING", International Journal of Emerging Technologies and Innovative Research (www.jetir.org | UGC and issn Approved), ISSN:2349-5162, Vol.7, Issue 6, page no. pp1306-1312, June-2020, Available at : http://www.jetir.org/papers/JETIR2006189.pdf

Publication Details

Published Paper ID: JETIR2006189
Registration ID: 234179
Published In: Volume 7 | Issue 6 | Year June-2020
DOI (Digital Object Identifier):
Page No: 1306-1312
ISSN Number: 2349-5162

Download Paper

Preview Article

Download Paper




Cite This Article

"REAL TIME VOICE CLONING", International Journal of Emerging Technologies and Innovative Research (www.jetir.org | UGC and issn Approved), ISSN:2349-5162, Vol.7, Issue 6, page no. pp1306-1312, June-2020, Available at : http://www.jetir.org/papers/JETIR2006189.pdf




Preview This Article


Downlaod

Click here for Article Preview