UGC Approved Journal no 63975(19)

ISSN: 2349-5162 | ESTD Year : 2014
Call for Paper
Volume 11 | Issue 4 | April 2024

JETIREXPLORE- Search Thousands of research papers



WhatsApp Contact
Click Here

Published in:

Volume 5 Issue 3
March-2018
eISSN: 2349-5162

UGC and ISSN approved 7.95 impact factor UGC Approved Journal no 63975

7.95 impact factor calculated by Google scholar

Unique Identifier

Published Paper ID:
JETIR1803054


Registration ID:
180566

Page Number

281-283

Share This Article


Jetir RMS

Title

Deduplication in Databases using Pattern Matching

Abstract

Semantic duplicates in databases represent today an important data quality challenge which leads to bad decisions. In large databases, sometimes find ourselves with tens of thousands of duplicates, which necessitates an automatic deduplication. Deduplication is a capacity optimization technology that is being used to dramatically improve storage efficiency. For this, it is necessary to detect duplicates, with a fairly reliable method to find as many duplicates as possible and powerful enough to run in a reasonable time. In proposed system, introduce an effective duplicate detection method for automatic deduplication of text files and repeated strings. This will be working with the dataset from WordNet. WordNet is a lexical database for the English language. It groups English words into sets of synonyms called synsets, provides short definitions and usage examples, and records a number of relations among these synonym sets or their members. In the proposed system check the strings or text files that are semantically similar. If they are semantically similar, then remove the string and maintain only one copy of the data. To achieve this, KMP algorithm and Levenshtein Distance method used. These algorithms give better results than those of known methods, with a lesser complexity.

Key Words

Deduplication, KMP algorithm, Levenshtein Distance, WordNet

Cite This Article

"Deduplication in Databases using Pattern Matching", International Journal of Emerging Technologies and Innovative Research (www.jetir.org), ISSN:2349-5162, Vol.5, Issue 3, page no.281-283, March-2018, Available :http://www.jetir.org/papers/JETIR1803054.pdf

ISSN


2349-5162 | Impact Factor 7.95 Calculate by Google Scholar

An International Scholarly Open Access Journal, Peer-Reviewed, Refereed Journal Impact Factor 7.95 Calculate by Google Scholar and Semantic Scholar | AI-Powered Research Tool, Multidisciplinary, Monthly, Multilanguage Journal Indexing in All Major Database & Metadata, Citation Generator

Cite This Article

"Deduplication in Databases using Pattern Matching", International Journal of Emerging Technologies and Innovative Research (www.jetir.org | UGC and issn Approved), ISSN:2349-5162, Vol.5, Issue 3, page no. pp281-283, March-2018, Available at : http://www.jetir.org/papers/JETIR1803054.pdf

Publication Details

Published Paper ID: JETIR1803054
Registration ID: 180566
Published In: Volume 5 | Issue 3 | Year March-2018
DOI (Digital Object Identifier):
Page No: 281-283
Country: Shoranur, KERALA, India .
Area: Engineering
ISSN Number: 2349-5162
Publisher: IJ Publication


Preview This Article


Downlaod

Click here for Article Preview

Download PDF

Downloads

0003054

Print This Page

Current Call For Paper

Jetir RMS