UGC Approved Journal no 63975(19)

ISSN: 2349-5162 | ESTD Year : 2014
Call for Paper
Volume 11 | Issue 3 | March 2024

JETIREXPLORE- Search Thousands of research papers



WhatsApp Contact
Click Here

Published in:

Volume 5 Issue 5
May-2018
eISSN: 2349-5162

UGC and ISSN approved 7.95 impact factor UGC Approved Journal no 63975

7.95 impact factor calculated by Google scholar

Unique Identifier

Published Paper ID:
JETIR1805905


Registration ID:
228439

Page Number

991-994

Share This Article


Jetir RMS

Title

USING PARSER PERFORMS SEGMENTATION OF WEB PAGES AND EXTRACTION OF TEMPLATES

Abstract

Many Web sites contain much explicit and implicit structure, both in layout and content that we can exploit for the purpose of information extraction. This paper describes an approach to automatic extraction and segmentation of records from Web tables. Automatic methods do not require any user input, but rely solely on the layout and content of the Web source. Our approach relies on the common structure of many Web sites, which present information as a list or a table, with a link in each entry leading to a detail page containing additional information about that item. We describe two algorithms that use redundancies in the content of table and detail pages to aid in information extraction. The first algorithm encodes additional information provided by detail pages as constraints and finds the segmentation by solving a constraint satisfaction problem. The second algorithm uses probabilistic inference to find the record segmentation. We show how each approach can exploit the web site structure in a general, domain-independent manner, and we demonstrate the effectiveness of each algorithm on a set of twelve Web sites.

Key Words

Parser, Web Pages, Extract

Cite This Article

"USING PARSER PERFORMS SEGMENTATION OF WEB PAGES AND EXTRACTION OF TEMPLATES", International Journal of Emerging Technologies and Innovative Research (www.jetir.org), ISSN:2349-5162, Vol.5, Issue 5, page no.991-994, May-2018, Available :http://www.jetir.org/papers/JETIR1805905.pdf

ISSN


2349-5162 | Impact Factor 7.95 Calculate by Google Scholar

An International Scholarly Open Access Journal, Peer-Reviewed, Refereed Journal Impact Factor 7.95 Calculate by Google Scholar and Semantic Scholar | AI-Powered Research Tool, Multidisciplinary, Monthly, Multilanguage Journal Indexing in All Major Database & Metadata, Citation Generator

Cite This Article

"USING PARSER PERFORMS SEGMENTATION OF WEB PAGES AND EXTRACTION OF TEMPLATES", International Journal of Emerging Technologies and Innovative Research (www.jetir.org | UGC and issn Approved), ISSN:2349-5162, Vol.5, Issue 5, page no. pp991-994, May-2018, Available at : http://www.jetir.org/papers/JETIR1805905.pdf

Publication Details

Published Paper ID: JETIR1805905
Registration ID: 228439
Published In: Volume 5 | Issue 5 | Year May-2018
DOI (Digital Object Identifier):
Page No: 991-994
Country: -, -, - .
Area: Engineering
ISSN Number: 2349-5162
Publisher: IJ Publication


Preview This Article


Downlaod

Click here for Article Preview

Download PDF

Downloads

0003032

Print This Page

Current Call For Paper

Jetir RMS