UGC Approved Journal no 63975(19)

ISSN: 2349-5162 | ESTD Year : 2014
Call for Paper
Volume 11 | Issue 5 | May 2024

JETIREXPLORE- Search Thousands of research papers



WhatsApp Contact
Click Here

Published in:

Volume 6 Issue 6
June-2019
eISSN: 2349-5162

UGC and ISSN approved 7.95 impact factor UGC Approved Journal no 63975

7.95 impact factor calculated by Google scholar

Unique Identifier

Published Paper ID:
JETIR1906589


Registration ID:
214577

Page Number

35-42

Share This Article


Jetir RMS

Title

Record-aware Partial Compression scheme in Spark

Authors

Abstract

In Real-time, digital data is increasing in all aspects like banking, healthcare, education and, social science, etc. A Large amount of data will be stored in the form of text. For effective data analysis representation, the compression process is used which reduces the data size, storage space and transmission cost of data. In an existing system, the Content-aware Partial Compression scheme (CaPC) was developed which improves the performance and reduces the resources for textual big data analysis in Hadoop. But this scheme should not offer better compression ratio and performance. To get the maximum value of the compression for data analysis, the two-layered architecture of the Record-aware Partial Compression scheme (RaPC) is developed in Hadoop. But the main disadvantage with Hadoop is, it supports only batch processing and it can't support stream processing. The batch processing is efficient to process a large amount of data, but it depends on the data size and computational power of the system, so there will be a delay in output and overall performance will be slow. The proposed work includes the replacement of Hadoop with Spark which supports stream processing. This stream processing will be worked as continuous input and output, which improves the speed of the data and the data will be processed in less amount of time. Spark uses in-memory processing to process the data. The memory is still considered to be an expensive hardware resource, so this memory consumption will be reduced by using CaPC or RaPC Layer-1 scheme. In CaPC or RaPC Layer-1 compression, the compressed data can be processed directly without decompression.

Key Words

Big data,Data storage,Compression,Spark,In-memory processing,RaPC scheme,Huffman encoding

Cite This Article

"Record-aware Partial Compression scheme in Spark", International Journal of Emerging Technologies and Innovative Research (www.jetir.org), ISSN:2349-5162, Vol.6, Issue 6, page no.35-42, June-2019, Available :http://www.jetir.org/papers/JETIR1906589.pdf

ISSN


2349-5162 | Impact Factor 7.95 Calculate by Google Scholar

An International Scholarly Open Access Journal, Peer-Reviewed, Refereed Journal Impact Factor 7.95 Calculate by Google Scholar and Semantic Scholar | AI-Powered Research Tool, Multidisciplinary, Monthly, Multilanguage Journal Indexing in All Major Database & Metadata, Citation Generator

Cite This Article

"Record-aware Partial Compression scheme in Spark", International Journal of Emerging Technologies and Innovative Research (www.jetir.org | UGC and issn Approved), ISSN:2349-5162, Vol.6, Issue 6, page no. pp35-42, June-2019, Available at : http://www.jetir.org/papers/JETIR1906589.pdf

Publication Details

Published Paper ID: JETIR1906589
Registration ID: 214577
Published In: Volume 6 | Issue 6 | Year June-2019
DOI (Digital Object Identifier):
Page No: 35-42
Country: ananthapur, andhra pradesh, India .
Area: Engineering
ISSN Number: 2349-5162
Publisher: IJ Publication


Preview This Article


Downlaod

Click here for Article Preview

Download PDF

Downloads

0002829

Print This Page

Current Call For Paper

Jetir RMS