IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Strategies for Large-Scale Entity Resolution Based on Inverted Index Data Partitioning

Strategies for Large-Scale Entity Resolution Based on Inverted Index Data Partitioning
View Sample PDF
Author(s): Yinle Zhou (IBM Corporation, USA)and John R. Talburt (University of Arkansas – Little Rock, USA)
Copyright: 2014
Pages: 23
Source title: Information Quality and Governance for Business Intelligence
Source Author(s)/Editor(s): William Yeoh (Deakin University, Australia), John R. Talburt (University of Arkansas at Little Rock, USA)and Yinle Zhou (IBM Corporation, USA)
DOI: 10.4018/978-1-4666-4892-0.ch017

Purchase

View Strategies for Large-Scale Entity Resolution Based on Inverted Index Data Partitioning on the publisher's website for pricing and purchasing information.

Abstract

Inverted indexing is a commonly used technique for improving the performance of entity resolution algorithms by reducing the number of pair-wise comparisons necessary to arrive at acceptable results. This chapter describes how inverted indexing can also be used as a data partitioning strategy to perform entity resolution on large datasets in a distributed processing environment. This chapter discusses the importance of index-to-rule alignment, pre-resolution index closure, post-resolution link closure, and workflows for record-based identity capture and update, and attribute-based identity capture and update in a distributed processing environment.

Related Content

Dina Darwish. © 2024. 48 pages.
Dina Darwish. © 2024. 51 pages.
Smrity Prasad, Kashvi Prawal. © 2024. 19 pages.
Jignesh Patil, Sharmila Rathod. © 2024. 17 pages.
Ganesh B. Regulwar, Ashish Mahalle, Raju Pawar, Swati K. Shamkuwar, Priti Roshan Kakde, Swati Tiwari. © 2024. 23 pages.
Pranali Dhawas, Abhishek Dhore, Dhananjay Bhagat, Ritu Dorlikar Pawar, Ashwini Kukade, Kamlesh Kalbande. © 2024. 24 pages.
Pranali Dhawas, Minakshi Ashok Ramteke, Aarti Thakur, Poonam Vijay Polshetwar, Ramadevi Vitthal Salunkhe, Dhananjay Bhagat. © 2024. 26 pages.
Body Bottom