UGC Approved Journal no 63975(19)

ISSN: 2349-5162 | ESTD Year : 2014
Call for Paper
Volume 11 | Issue 4 | April 2024

JETIREXPLORE- Search Thousands of research papers



WhatsApp Contact
Click Here

Published in:

Volume 5 Issue 5
May-2018
eISSN: 2349-5162

UGC and ISSN approved 7.95 impact factor UGC Approved Journal no 63975

7.95 impact factor calculated by Google scholar

Unique Identifier

Published Paper ID:
JETIR1805753


Registration ID:
182952

Page Number

893-897

Share This Article


Jetir RMS

Title

Identifying spam SMS using Apache Spark MLlib

Abstract

Short Message Service (SMS) has grown huge now days because of its flexibility and making communication more effortless to the mobile phone users. At the same time, increasing popularity of SMS and reduction of the cost of messaging service has made advertiser more interesting to send unsolicited commercial advertisements (spam) to users. Spam messages are annoying and many people do not want to get. There are many methods available to detect spam messages. Different classifiers which depend on Naïve Bays, Support Vector Machine and many other ML algorithms were already used. In this paper we compared different classifiers which mainly depend on Apache Spark MLlib library to evaluate accuracy and runtime. Here we used Logistic Regression with L-BFGS (Limited-memory Broyden–Fletcher–Goldfarb–Shanno), Naïve Bays, Decision Tree and Gradient Boosted Trees to compare the vectors. Besides using different classifiers, also descried the most important features that were being used as input to Decision Tree, Gradient Boosted Trees, Naïve Bays and Logistic Regression with L-BFGS classifiers. Features those are helps to detect spam SMS mostly the existence of URL and the number of digits present in a SMS. The experiment used the dataset which is proposed by UCI Machine Learning Repositories. For this experiment the dataset was split into two parts so that 80% of the data were taken as training purpose and 20% of the data were taken as testing purpose. Therefore, experiments show that Naïve Bays is the faster algorithm to achieve best accuracy than others. It took 3.16 seconds to achieve 95% accuracy on test data.

Key Words

SMS Spam, Naïve Bays, Decision Tree, Logistic Regression with LBFGS, Gradient Boosted Trees, Apache Spark MLlib

Cite This Article

"Identifying spam SMS using Apache Spark MLlib", International Journal of Emerging Technologies and Innovative Research (www.jetir.org), ISSN:2349-5162, Vol.5, Issue 5, page no.893-897, MAY-2018, Available :http://www.jetir.org/papers/JETIR1805753.pdf

ISSN


2349-5162 | Impact Factor 7.95 Calculate by Google Scholar

An International Scholarly Open Access Journal, Peer-Reviewed, Refereed Journal Impact Factor 7.95 Calculate by Google Scholar and Semantic Scholar | AI-Powered Research Tool, Multidisciplinary, Monthly, Multilanguage Journal Indexing in All Major Database & Metadata, Citation Generator

Cite This Article

"Identifying spam SMS using Apache Spark MLlib", International Journal of Emerging Technologies and Innovative Research (www.jetir.org | UGC and issn Approved), ISSN:2349-5162, Vol.5, Issue 5, page no. pp893-897, MAY-2018, Available at : http://www.jetir.org/papers/JETIR1805753.pdf

Publication Details

Published Paper ID: JETIR1805753
Registration ID: 182952
Published In: Volume 5 | Issue 5 | Year May-2018
DOI (Digital Object Identifier):
Page No: 893-897
Country: PASCHIM MEDINIPUR, WEST BENGAL, India .
Area: Science & Technology
ISSN Number: 2349-5162
Publisher: IJ Publication


Preview This Article


Downlaod

Click here for Article Preview

Download PDF

Downloads

0003061

Print This Page

Current Call For Paper

Jetir RMS