IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Data Extraction from Deep Web Sites

Data Extraction from Deep Web Sites
View Sample PDF
Author(s): Hadrian Peter (University of the West Indies, Barbados)and Charles Greenidge (University of the West Indies, Barbados)
Copyright: 2008
Pages: 8
Source title: Encyclopedia of Internet Technologies and Applications
Source Author(s)/Editor(s): Mario Freire (University of Beira Interior, Portugal)and Manuela Pereira (University of Beira Interior, Portugal)
DOI: 10.4018/978-1-59140-993-9.ch021

Purchase

View Data Extraction from Deep Web Sites on the publisher's website for pricing and purchasing information.

Abstract

Traditionally a great deal of research has been devoted to data extraction on the web (Crescenzi, et al, 2001; Embley, et al, 2005; Laender, et al, 2002; Hammer, et al, 1997; Ribeiro-Neto, et al, 1999; Huck, et al, 1998; Wang & Lochovsky, 2002, 2003) from areas where data is easily indexed and extracted by a Search Engine, the so-called Surface Web. There are, however, other sites that are greater and potentially more vital, that contain information which cannot be readily indexed by standard search engines. These sites which have been designed to require some level of direct human participation (for example, to issue queries rather than simply follow hyperlinks) cannot be handled using the simple link traversal techniques used by many web crawlers (Rappaport, 2000; Cho & Garcia-Molina, 2000; Cho et al, 1998; Edwards et al, 2001). This area of the web, which has been operationally off-limits for crawlers using standard indexing procedures, is termed the Deep Web (Zillman, 2005; Bergman, 2000). Much work still needs to be done as Deep Web sites represent an area that is only recently being explored to identify where potential uses can be developed.

Related Content

Nalini M.. © 2023. 22 pages.
Balachandar S., Chinnaiyan R.. © 2023. 19 pages.
V. A. Velvizhi, G. Senbagavalli, S. Malini. © 2023. 29 pages.
Amuthan Nallathambi, Kannan Nova. © 2023. 25 pages.
Amuthan Nallathambi, Sivakumar N., Velrajkumar P.. © 2023. 17 pages.
Nayana Hegde, Sunilkumar S. Manvi. © 2023. 18 pages.
Udayakumar K., Ramamoorthy S., Poorvadevi R.. © 2023. 26 pages.
Body Bottom