IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Big Data Techniques for Supporting Official Statistics: The Use of Web Scraping for Collecting Price Data

Big Data Techniques for Supporting Official Statistics: The Use of Web Scraping for Collecting Price Data
View Sample PDF
Author(s): Antonino Virgillito (Istituto Nazionale di Statistica (ISTAT), Italy)and Federico Polidoro (Istituto Nazionale di Statistica (ISTAT), Italy)
Copyright: 2019
Pages: 17
Source title: Web Services: Concepts, Methodologies, Tools, and Applications
Source Author(s)/Editor(s): Information Resources Management Association (USA)
DOI: 10.4018/978-1-5225-7501-6.ch040

Purchase

View Big Data Techniques for Supporting Official Statistics: The Use of Web Scraping for Collecting Price Data on the publisher's website for pricing and purchasing information.

Abstract

Following the advent of Big Data, statistical offices have been largely exploring the use of Internet as data source for modernizing their data collection process. Particularly, prices are collected online in several statistical institutes through a technique known as web scraping. The objective of the chapter is to discuss the challenges of web scraping for setting up a continuous data collection process, exploring and classifying the more widespread techniques and presenting how they are used in practical cases. The main technical notions behind web scraping are presented and explained in order to give also to readers with no background in IT the sufficient elements to fully comprehend scraping techniques, promoting the building of mixed skills that is at the core of the spirit of modern data science. Challenges for official statistics deriving from the use of web scraping are briefly sketched. Finally, research ideas for overcoming the limitations of current techniques are presented and discussed.

Related Content

Mohib Ullah, Arbab Waseem Abbas, Lala Rukh, Kamran Ullah, Muhammad Inam Ul Haq. © 2023. 25 pages.
Rafi Ullah Khan, Mohib Ullah, Bushra Shafi, Imran Ihsan. © 2023. 20 pages.
Rafi Ullah Khan, Mohib Ullah, Bushra Shafi. © 2023. 17 pages.
Shaukat Ali, Shah Khusro, Mumtaz Khan. © 2023. 34 pages.
Tayyaba Riaz, Iftikhar Alam. © 2023. 20 pages.
Ufuk Uçak, Gurkan Tuna. © 2023. 22 pages.
Muhammad Hamad, Altaf Hussain, Majida Khan Tareen. © 2023. 21 pages.
Body Bottom