UGC Approved Journal no 63975(19)
New UGC Peer-Reviewed Rules

ISSN: 2349-5162 | ESTD Year : 2014
Volume 12 | Issue 9 | September 2025

JETIREXPLORE- Search Thousands of research papers



WhatsApp Contact
Click Here

Published in:

Volume 12 Issue 4
April-2025
eISSN: 2349-5162

UGC and ISSN approved 7.95 impact factor UGC Approved Journal no 63975

7.95 impact factor calculated by Google scholar

Unique Identifier

Published Paper ID:
JETIR2504B66


Registration ID:
560433

Page Number

l528-l536

Share This Article


Jetir RMS

Title

DIFFERENTIATION AND TAGGING OF REAL Vs SYNTHETIC DATA

Abstract

In our proposed work, we present a novel approach to address the increasing demand for large, high-quality datasets in machine learning (ML), particularly in the healthcare domain. Synthetic data is generated using four methods, including Copula and Generative Adversarial Networks (GANs), and evaluated for its applicability in lung cancer risk factor analysis. Incremental Ensemble Learning models, comprising Adaptive Random Forest classifiers, Softmax Regressor, and K-Nearest Neighbors (KNN), are employed to assess classification performance using synthetic versus real data. Pearson's correlation coefficient is utilized to measure data similarity, revealing a strong relationship between higher correlation and improved model performance. Among the methods, GAN-generated data demonstrated superior performance and was most challenging to distinguish from real data. Furthermore, the concept is extended to classify medical imaging datasets as real or synthetic. For real datasets, a watermark is embedded into patient reports generated from healthcare data to ensure authenticity and security. Synthetic datasets are excluded from watermarking to support research and simulation purposes. This integration of synthetic data classification, patient report generation, and watermarking enhances data reliability, promotes secure healthcare practices, and facilitates advancements in ML- based medical applications.

Key Words

Synthetic data, Machine learning, Healthcare, Generative Adversarial Networks (GANs), Incremental Ensemble Learning, Data similarity, Watermarking, Data reliability.

Cite This Article

"DIFFERENTIATION AND TAGGING OF REAL Vs SYNTHETIC DATA", International Journal of Emerging Technologies and Innovative Research (www.jetir.org), ISSN:2349-5162, Vol.12, Issue 4, page no.l528-l536, April-2025, Available :http://www.jetir.org/papers/JETIR2504B66.pdf

ISSN


2349-5162 | Impact Factor 7.95 Calculate by Google Scholar

An International Scholarly Open Access Journal, Peer-Reviewed, Refereed Journal Impact Factor 7.95 Calculate by Google Scholar and Semantic Scholar | AI-Powered Research Tool, Multidisciplinary, Monthly, Multilanguage Journal Indexing in All Major Database & Metadata, Citation Generator

Cite This Article

"DIFFERENTIATION AND TAGGING OF REAL Vs SYNTHETIC DATA", International Journal of Emerging Technologies and Innovative Research (www.jetir.org | UGC and issn Approved), ISSN:2349-5162, Vol.12, Issue 4, page no. ppl528-l536, April-2025, Available at : http://www.jetir.org/papers/JETIR2504B66.pdf

Publication Details

Published Paper ID: JETIR2504B66
Registration ID: 560433
Published In: Volume 12 | Issue 4 | Year April-2025
DOI (Digital Object Identifier):
Page No: l528-l536
Country: pondicherry, Puducherry, India .
Area: Engineering
ISSN Number: 2349-5162
Publisher: IJ Publication


Preview This Article


Downlaod

Click here for Article Preview

Download PDF

Downloads

000109

Print This Page

Current Call For Paper

Jetir RMS