UGC Approved Journal no 63975(19)
New UGC Peer-Reviewed Rules

ISSN: 2349-5162 | ESTD Year : 2014
Volume 13 | Issue 3 | March 2026

JETIREXPLORE- Search Thousands of research papers



WhatsApp Contact
Click Here

Published in:

Volume 9 Issue 1
January-2022
eISSN: 2349-5162

UGC and ISSN approved 7.95 impact factor UGC Approved Journal no 63975

7.95 impact factor calculated by Google scholar

Unique Identifier

Published Paper ID:
JETIR2201143


Registration ID:
318636

Page Number

b321-b329

Share This Article


Jetir RMS

Title

TF_IDF AND PROBABILITY BASED CLUSTERING SCHEME FOR LARGE DENSE TEXT DOCUMENT

Abstract

The text documents are very important in the usage environment of digital world. Numerous users have in need of too much text document to gather the information in their required field of interest. To serve the internet surfers the appropriate required topic documents are to be retrieved. For this purpose for indexing and retrieving the text document the researchers tend to produce many algorithms in the field of text document mining. The entire effort of clustering is achieved relying on the selection of appropriate similarity metrics. The proposed system builds the clustering operation by means of two segments of sequence of operations. The primary one is the operation of feature extraction from the document corpus. The next one is the clustering operation. In the initial process to extract the features from the text document several tasks like preprocessing, tokenization, Stop word removal, streaming and bag of Words were performed. Through the execution of extraction the Document representing features namely TF_IDF and probability of words were determined to perform the clustering operation with K-means clustering algorithm. In the clustering operation the two features and few of the similarity measures were used to perform the clustering operation. The proposed method yields better performance for Spearman Similarity compared with other two Cosine Similarity and Pearson Correlation Similarity metrics

Key Words

–TFIDF, Probability, pre-processing, Clustering, K-Means

Cite This Article

"TF_IDF AND PROBABILITY BASED CLUSTERING SCHEME FOR LARGE DENSE TEXT DOCUMENT ", International Journal of Emerging Technologies and Innovative Research (www.jetir.org), ISSN:2349-5162, Vol.9, Issue 1, page no.b321-b329, January-2022, Available :http://www.jetir.org/papers/JETIR2201143.pdf

ISSN


2349-5162 | Impact Factor 7.95 Calculate by Google Scholar

An International Scholarly Open Access Journal, Peer-Reviewed, Refereed Journal Impact Factor 7.95 Calculate by Google Scholar and Semantic Scholar | AI-Powered Research Tool, Multidisciplinary, Monthly, Multilanguage Journal Indexing in All Major Database & Metadata, Citation Generator

Cite This Article

"TF_IDF AND PROBABILITY BASED CLUSTERING SCHEME FOR LARGE DENSE TEXT DOCUMENT ", International Journal of Emerging Technologies and Innovative Research (www.jetir.org | UGC and issn Approved), ISSN:2349-5162, Vol.9, Issue 1, page no. ppb321-b329, January-2022, Available at : http://www.jetir.org/papers/JETIR2201143.pdf

Publication Details

Published Paper ID: JETIR2201143
Registration ID: 318636
Published In: Volume 9 | Issue 1 | Year January-2022
DOI (Digital Object Identifier):
Page No: b321-b329
Country: Chidambaram, Tamilnadhu, India .
Area: Science & Technology
ISSN Number: 2349-5162
Publisher: IJ Publication


Preview This Article


Downlaod

Click here for Article Preview

Download PDF

Downloads

000495

Print This Page

Current Call For Paper

Jetir RMS