JETIREXPLORE- Search Thousands of research papers



Published in:

Volume 7 Issue 6
June-2020
eISSN: 2349-5162

Unique Identifier

JETIR2006313

Page Number

2156-2162

Share This Article


Title

An Improved Approach for Fast Documents Scrapping and Classifying Using Selenium Automation and Multinomial Naïve Bayes Classifier

ISSN

2349-5162

Cite This Article

"An Improved Approach for Fast Documents Scrapping and Classifying Using Selenium Automation and Multinomial Naïve Bayes Classifier", International Journal of Emerging Technologies and Innovative Research (www.jetir.org), ISSN:2349-5162, Vol.7, Issue 6, page no.2156-2162, June-2020, Available :http://www.jetir.org/papers/JETIR2006313.pdf

Abstract

Generally, Selenium Automation is used for testing purpose and detecting errors and defects of the system in development. However, we will use Selenium for making a list of required web element from web page. In addition, by using that list selenium will identify the new documents from web pages for scrapping data. For example, we consider any tenders site, there may be thousands of tenders getting published every day, so it is very hard for the user to surf every tender one after another to get the tender of his/her need. But, in our method firstly, we are using bag of words method to gather test data for further classification. Secondly, we are using Multinomial Naïve Bayes Classifier to classify our documents industry wise which will be useful for the user to pick up his category fresh tender. For picking up a fresh tender, user will access the folder created on the desktop where the scraped fresh documents will be stored in a technology wise folder. In the last, Confusion Matrix will be built and detailed accuracy by class for the technology category will be calculated and shown. This approach helps the larger service providing business organizations to provide their clients the documents of their needed categories.

Key Words

Selenium Web Driver, Multinomial Naïve Bayes Classifier, Bag of Words, Stream Writer, Web Scrapping.

Cite This Article

"An Improved Approach for Fast Documents Scrapping and Classifying Using Selenium Automation and Multinomial Naïve Bayes Classifier", International Journal of Emerging Technologies and Innovative Research (www.jetir.org | UGC and issn Approved), ISSN:2349-5162, Vol.7, Issue 6, page no. pp2156-2162, June-2020, Available at : http://www.jetir.org/papers/JETIR2006313.pdf

Publication Details

Published Paper ID: JETIR2006313
Registration ID: 234486
Published In: Volume 7 | Issue 6 | Year June-2020
DOI (Digital Object Identifier):
Page No: 2156-2162
ISSN Number: 2349-5162

Download Paper

Preview Article

Download Paper




Cite This Article

"An Improved Approach for Fast Documents Scrapping and Classifying Using Selenium Automation and Multinomial Naïve Bayes Classifier", International Journal of Emerging Technologies and Innovative Research (www.jetir.org | UGC and issn Approved), ISSN:2349-5162, Vol.7, Issue 6, page no. pp2156-2162, June-2020, Available at : http://www.jetir.org/papers/JETIR2006313.pdf




Preview This Article


Downlaod

Click here for Article Preview