UGC Approved Journal no 63975(19)

ISSN: 2349-5162 | ESTD Year : 2014
Call for Paper
Volume 11 | Issue 4 | April 2024

JETIREXPLORE- Search Thousands of research papers



WhatsApp Contact
Click Here

Published in:

Volume 5 Issue 5
May-2018
eISSN: 2349-5162

UGC and ISSN approved 7.95 impact factor UGC Approved Journal no 63975

7.95 impact factor calculated by Google scholar

Unique Identifier

Published Paper ID:
JETIR1805812


Registration ID:
182693

Page Number

421-424

Share This Article


Jetir RMS

Title

Deduplication in Databases using Locality Sensitive Hashing and Bloom filter

Abstract

Duplicates in databases represent today an important data quality challenge which leads to bad decisions. Deduplication is a capacity optimization technology that is being used to dramatically improve storage efficiency. In large databases, sometimes find ourselves with tens of thousands of duplicates, which necessitates an automatic deduplication. It can reduce the amount of storage cost by eliminating duplicate data copies. In proposed system, introduce an effective duplicate detection method for automatic deduplication of text files and repeated strings. In this paper, we propose a similarity-based data deduplication scheme by integrating the technologies of bloom filter and Locality Sensitive hashing (LSH), which can significantly reduce the computation overhead by only performing deduplication operations for similar texts. In the proposed system check the strings or text in the repository that are similar. If they are similar, then remove the string and maintain only one copy of the data. Locality Sensitive Hashing and bloom filter methods provide better results than those of known methods, with a lesser complexity.

Key Words

Deduplication, Bloom filter, Levenshtein Distance

Cite This Article

"Deduplication in Databases using Locality Sensitive Hashing and Bloom filter", International Journal of Emerging Technologies and Innovative Research (www.jetir.org), ISSN:2349-5162, Vol.5, Issue 5, page no.421-424, MAY-2018, Available :http://www.jetir.org/papers/JETIR1805812.pdf

ISSN


2349-5162 | Impact Factor 7.95 Calculate by Google Scholar

An International Scholarly Open Access Journal, Peer-Reviewed, Refereed Journal Impact Factor 7.95 Calculate by Google Scholar and Semantic Scholar | AI-Powered Research Tool, Multidisciplinary, Monthly, Multilanguage Journal Indexing in All Major Database & Metadata, Citation Generator

Cite This Article

"Deduplication in Databases using Locality Sensitive Hashing and Bloom filter", International Journal of Emerging Technologies and Innovative Research (www.jetir.org | UGC and issn Approved), ISSN:2349-5162, Vol.5, Issue 5, page no. pp421-424, MAY-2018, Available at : http://www.jetir.org/papers/JETIR1805812.pdf

Publication Details

Published Paper ID: JETIR1805812
Registration ID: 182693
Published In: Volume 5 | Issue 5 | Year May-2018
DOI (Digital Object Identifier):
Page No: 421-424
Country: Shoranur, KERALA, India .
Area: Engineering
ISSN Number: 2349-5162
Publisher: IJ Publication


Preview This Article


Downlaod

Click here for Article Preview

Download PDF

Downloads

0002955

Print This Page

Current Call For Paper

Jetir RMS