UGC Approved Journal no 63975(19)

ISSN: 2349-5162 | ESTD Year : 2014
Call for Paper
Volume 11 | Issue 4 | April 2024

JETIREXPLORE- Search Thousands of research papers



WhatsApp Contact
Click Here

Published in:

Volume 7 Issue 6
June-2020
eISSN: 2349-5162

UGC and ISSN approved 7.95 impact factor UGC Approved Journal no 63975

7.95 impact factor calculated by Google scholar

Unique Identifier

Published Paper ID:
JETIR2006313


Registration ID:
234486

Page Number

2156-2162

Share This Article


Jetir RMS

Title

An Improved Approach for Fast Documents Scrapping and Classifying Using Selenium Automation and Multinomial Naïve Bayes Classifier

Abstract

Generally, Selenium Automation is used for testing purpose and detecting errors and defects of the system in development. However, we will use Selenium for making a list of required web element from web page. In addition, by using that list selenium will identify the new documents from web pages for scrapping data. For example, we consider any tenders site, there may be thousands of tenders getting published every day, so it is very hard for the user to surf every tender one after another to get the tender of his/her need. But, in our method firstly, we are using bag of words method to gather test data for further classification. Secondly, we are using Multinomial Naïve Bayes Classifier to classify our documents industry wise which will be useful for the user to pick up his category fresh tender. For picking up a fresh tender, user will access the folder created on the desktop where the scraped fresh documents will be stored in a technology wise folder. In the last, Confusion Matrix will be built and detailed accuracy by class for the technology category will be calculated and shown. This approach helps the larger service providing business organizations to provide their clients the documents of their needed categories.

Key Words

Selenium Web Driver, Multinomial Naïve Bayes Classifier, Bag of Words, Stream Writer, Web Scrapping.

Cite This Article

"An Improved Approach for Fast Documents Scrapping and Classifying Using Selenium Automation and Multinomial Naïve Bayes Classifier", International Journal of Emerging Technologies and Innovative Research (www.jetir.org), ISSN:2349-5162, Vol.7, Issue 6, page no.2156-2162, June-2020, Available :http://www.jetir.org/papers/JETIR2006313.pdf

ISSN


2349-5162 | Impact Factor 7.95 Calculate by Google Scholar

An International Scholarly Open Access Journal, Peer-Reviewed, Refereed Journal Impact Factor 7.95 Calculate by Google Scholar and Semantic Scholar | AI-Powered Research Tool, Multidisciplinary, Monthly, Multilanguage Journal Indexing in All Major Database & Metadata, Citation Generator

Cite This Article

"An Improved Approach for Fast Documents Scrapping and Classifying Using Selenium Automation and Multinomial Naïve Bayes Classifier", International Journal of Emerging Technologies and Innovative Research (www.jetir.org | UGC and issn Approved), ISSN:2349-5162, Vol.7, Issue 6, page no. pp2156-2162, June-2020, Available at : http://www.jetir.org/papers/JETIR2006313.pdf

Publication Details

Published Paper ID: JETIR2006313
Registration ID: 234486
Published In: Volume 7 | Issue 6 | Year June-2020
DOI (Digital Object Identifier):
Page No: 2156-2162
Country: Ahmedabad, Gujarat, India .
Area: Engineering
ISSN Number: 2349-5162
Publisher: IJ Publication


Preview This Article


Downlaod

Click here for Article Preview

Download PDF

Downloads

0002967

Print This Page

Current Call For Paper

Jetir RMS