IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Big Data Techniques for Supporting Official Statistics: The Use of Web Scraping for Collecting Price Data

Big Data Techniques for Supporting Official Statistics: The Use of Web Scraping for Collecting Price Data
View Sample PDF
Author(s): Antonino Virgillito (Istituto Nazionale di Statistica (ISTAT), Italy)and Federico Polidoro (Istituto Nazionale di Statistica (ISTAT), Italy)
Copyright: 2017
Pages: 21
Source title: Data Visualization and Statistical Literacy for Open and Big Data
Source Author(s)/Editor(s): Theodosia Prodromou (University of New England, Australia)
DOI: 10.4018/978-1-5225-2512-7.ch010

Purchase

View Big Data Techniques for Supporting Official Statistics: The Use of Web Scraping for Collecting Price Data on the publisher's website for pricing and purchasing information.

Abstract

Following the advent of Big Data, statistical offices have been largely exploring the use of Internet as data source for modernizing their data collection process. Particularly, prices are collected online in several statistical institutes through a technique known as web scraping. The objective of the chapter is to discuss the challenges of web scraping for setting up a continuous data collection process, exploring and classifying the more widespread techniques and presenting how they are used in practical cases. The main technical notions behind web scraping are presented and explained in order to give also to readers with no background in IT the sufficient elements to fully comprehend scraping techniques, promoting the building of mixed skills that is at the core of the spirit of modern data science. Challenges for official statistics deriving from the use of web scraping are briefly sketched. Finally, research ideas for overcoming the limitations of current techniques are presented and discussed.

Related Content

N. Geethanjali, K. M. Ashifa, Avantika Raina, Jayashree Patil, Rameshwaran Byloppilly, S. Suman Rajest. © 2024. 19 pages.
Praveen Kakada, Muhammed Shafi M. K.. © 2024. 14 pages.
P. S. Venkateswaran, Divya Marupaka, Sachin Parate, Amit Bhanushali, Latha Thammareddi, P. Paramasivan. © 2024. 15 pages.
M. Lishmah Dominic, P. S. Venkateswaran, Latha Thamma Reddi, Sandeep Rangineni, R. Regin, S. Suman Rajest. © 2024. 15 pages.
S. Sivabala, P. Vidyasri. © 2024. 23 pages.
H. Hajra, G. Jayalakshmi. © 2024. 22 pages.
Anusha Thakur. © 2024. 15 pages.
Body Bottom