UGC Approved Journal no 63975(19)

ISSN: 2349-5162 | ESTD Year : 2014
Call for Paper
Volume 11 | Issue 4 | April 2024

JETIREXPLORE- Search Thousands of research papers



WhatsApp Contact
Click Here

Published in:

Volume 6 Issue 5
May-2019
eISSN: 2349-5162

UGC and ISSN approved 7.95 impact factor UGC Approved Journal no 63975

7.95 impact factor calculated by Google scholar

Unique Identifier

Published Paper ID:
JETIRCY06027


Registration ID:
219553

Page Number

168-179

Share This Article


Jetir RMS

Title

Analysis of Similarity Metrics Through Clustering using WordCount,TF_IDF and Probability

Abstract

The text documents are very important in the usage of www. Many users require so much text information to gather essential knowledge in their required field of interest. To serve the user internet the quite relevant required topic related documents are to be retrieved. To satisfy this requirement as well as indexing and retrieving the documents the researchers tend to produce many new algorithms in the field of text document mining. The proposed system carries out the problem by performing two different operations. Fist one is the feature extraction operation. The second on is the clustering operation. In the first operation to extract the features from the text document various operations like preprocessing, tokenization, Stop word removal, streaming and bag of Words were performed. By performing these operations the Document representing features namely WordCount, TF_IDF and probability of word occurrence were determined. The second operation is performed with K-means clustering algorithm. In the clustering phase the three features and some of the similarity measures were used to conclude the overall performance. The proposed method yields better performance for Spearman Similarity compared with other two Cosine Similarity and Correlation Similarity metrics.

Key Words

TFIDF, Word Frequency, Probability, pre-processing, Clustering, K-Means

Cite This Article

"Analysis of Similarity Metrics Through Clustering using WordCount,TF_IDF and Probability", International Journal of Emerging Technologies and Innovative Research (www.jetir.org), ISSN:2349-5162, Vol.6, Issue 5, page no.168-179, May 2019, Available :http://www.jetir.org/papers/JETIRCY06027.pdf

ISSN


2349-5162 | Impact Factor 7.95 Calculate by Google Scholar

An International Scholarly Open Access Journal, Peer-Reviewed, Refereed Journal Impact Factor 7.95 Calculate by Google Scholar and Semantic Scholar | AI-Powered Research Tool, Multidisciplinary, Monthly, Multilanguage Journal Indexing in All Major Database & Metadata, Citation Generator

Cite This Article

"Analysis of Similarity Metrics Through Clustering using WordCount,TF_IDF and Probability", International Journal of Emerging Technologies and Innovative Research (www.jetir.org | UGC and issn Approved), ISSN:2349-5162, Vol.6, Issue 5, page no. pp168-179, May 2019, Available at : http://www.jetir.org/papers/JETIRCY06027.pdf

Publication Details

Published Paper ID: JETIRCY06027
Registration ID: 219553
Published In: Volume 6 | Issue 5 | Year May-2019
DOI (Digital Object Identifier):
Page No: 168-179
Country: Chennai, Tamil Nadu, India .
Area: Engineering
ISSN Number: 2349-5162
Publisher: IJ Publication


Preview This Article


Downlaod

Click here for Article Preview

Download PDF

Downloads

0003000

Print This Page

Current Call For Paper

Jetir RMS