IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Web Harvesting: Web Data Extraction Techniques for Deep Web Pages

Web Harvesting: Web Data Extraction Techniques for Deep Web Pages
View Sample PDF
Author(s): B. Umamageswari (New Prince Shri Bhavani College of Engineering and Technology, India)and R. Kalpana (Pondicherry Engineering College, India)
Copyright: 2017
Pages: 28
Source title: Web Usage Mining Techniques and Applications Across Industries
Source Author(s)/Editor(s): A.V. Senthil Kumar (Hindusthan College of Arts and Science, India)
DOI: 10.4018/978-1-5225-0613-3.ch014

Purchase

View Web Harvesting: Web Data Extraction Techniques for Deep Web Pages on the publisher's website for pricing and purchasing information.

Abstract

Web mining is done on huge amounts of data extracted from WWW. Many researchers have developed several state-of-the-art approaches for web data extraction. So far in the literature, the focus is mainly on the techniques used for data region extraction. Applications which are fed with the extracted data, require fetching data spread across multiple web pages which should be crawled automatically. For this to happen, we need to extract not only data regions, but also the navigation links. Data extraction techniques are designed for specific HTML tags; which questions their universal applicability for carrying out information extraction from differently formatted web pages. This chapter focuses on various web data extraction techniques available for different kinds of data rich pages, classification of web data extraction techniques and comparison of those techniques across many useful dimensions.

Related Content

Dina Darwish. © 2024. 28 pages.
Dina Darwish. © 2024. 28 pages.
Muhammad Ahmed, Adnan Ahmad, Furkh Zeshan, Hamid Turab. © 2024. 33 pages.
Pankaj Bhambri. © 2024. 17 pages.
Kaushikkumar Patel. © 2024. 20 pages.
Vijaya Kittu Manda, Arnold Mashud Abukari, Vivek Gupta, Madavarapu Jhansi Bharathi. © 2024. 24 pages.
Pankaj Bhambri. © 2024. 17 pages.
Body Bottom