Transforming Images with text into Dynamic Video

Deepak Jadhavar; Sheela Chinchmalatpure; Swapnil Gawali; Amit Dolas; Vishal Bolke

This study explores various methodological approach to transforming static images with text overlays into dynamic videos. Utilizing Python libraries such as OpenCV and MoviePy, the proposed method integrates text overlay, image processing, and video creation to produce engaging visual content. The effectiveness of this method is evaluated based on processing time, video quality, and user feedback. Experimental results demonstrate that the approach not only achieves high-quality visual output but also ensures efficient processing, making it suitable for applications in digital marketing, educational content, and social media. The traditional approaches typically resulted in limited frame quality, with Structural Similarity Index (SSIM) scores around 0.75, Peak Signal-to-Noise Ratio (PSNR) values near 28 dB, and Mean Squared Error (MSE) for frame predictions at 0.04—leaving significant room for improvement. In our project, we present an enhanced framework that achieves higher accuracy and quality by combining advanced tools such as transformer-based embedding fusion (Xformer), Hugging Face for rich text embeddings, and OpenCV for refined image preprocessing. By optimizing GAN architecture and introducing a temporal LSTM network, we improved frame quality and coherence across generated videos. Our model achieves an SSIM score increase to 0.85 (+10%) and PSNR improvement to 30.2 dB, while reducing MSE to 0.02, thus providing smoother transitions and sharper visuals. Training accuracy for the LSTM model reached 90% over 50 epochs, with a corresponding testing accuracy of 88%, reflecting strong generalization compared to previous methods. Furthermore, GAN training loss decreased from 1.0 to 0.35 over 100 epochs, with testing loss stabilizing at 0.40, indicating reduced overfitting and consistent performance. Our improvements establish new benchmarks in dynamic video generation from static images and text, increasing model accuracy, visual quality, and temporal stability beyond the limitations of earlier approaches. The primary objectives of this study are to develop a seamless process for adding text to images, create dynamic transitions between these images, and compile them into an engaging video format. The paper focus with a discussion of the findings, implications for various fields, and potential areas for future research, such as the incorporation of more advanced animations and real-time processing capabilities.

Contact Us
Click Here

WhatsApp Contact
Click Here

Published in:

UGC and ISSN approved 7.95 impact factor UGC Approved Journal no 63975

Unique Identifier

Page Number

Post-Publication

Share This Article

Important Links:

Jetir RMS

Title

Authors

Abstract

Key Words

Cite This Article

ISSN

Cite This Article

Publication Details

Download Paper / Preview Article

Download Paper

Preview This Article

Download PDF

Downloads

Print This Page

Impact Factor:

7.95

Impact Factor Calculation click here

Impact Factor:

7.95

Impact Factor Calculation click here

Current Call For Paper

Call for Paper
Cilck Here For More Info

Important Links:

Jetir RMS

Contact Us Click Here

WhatsApp Contact Click Here

Published in:

UGC and ISSN approved 7.95 impact factor UGC Approved Journal no 63975

Unique Identifier

Page Number

Post-Publication

Share This Article

Important Links:

Jetir RMS

Title

Authors

Abstract

Key Words

Cite This Article

ISSN

Cite This Article

Publication Details

Download Paper / Preview Article

Download Paper

Preview This Article

Download PDF

Downloads

Print This Page

Impact Factor: 7.95 Impact Factor Calculation click here

Impact Factor:

7.95

Impact Factor Calculation click here

Current Call For Paper

Call for Paper Cilck Here For More Info

Important Links:

Jetir RMS

Contact Us
Click Here

WhatsApp Contact
Click Here

Impact Factor:

7.95

Impact Factor Calculation click here

Call for Paper
Cilck Here For More Info