IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

A Unified Algorithm for Identification of Various Tabular Structures from Document Images

A Unified Algorithm for Identification of Various Tabular Structures from Document Images
View Sample PDF
Author(s): Sekhar Mandal (Bengal Engineering & Science University, Shibpur, India), Amit K. Das (Bengal Engineering & Science University, Shibpur, India), Partha Bhowmick (Indian Institute of Technology Kharagpur, India)and Bhabatosh Chanda (Indian Statistical Institute, Kolkata, India)
Copyright: 2013
Pages: 28
Source title: Modern Library Technologies for Data Storage, Retrieval, and Use
Source Author(s)/Editor(s): Chia-Hung Wei (Ching Yun University, Taiwan)
DOI: 10.4018/978-1-4666-2928-8.ch001

Purchase

View A Unified Algorithm for Identification of Various Tabular Structures from Document Images on the publisher's website for pricing and purchasing information.

Abstract

This paper presents a unified algorithm for segmentation and identification of various tabular structures from document page images. Such tabular structures include conventional tables and displayed math-zones, as well as Table of Contents (TOC) and Index pages. After analyzing the page composition, the algorithm initially classifies the input set of document pages into tabular and non-tabular pages. A tabular page contains at least one of the tabular structures, whereas a non-tabular page does not contain any. The approach is unified in the sense that it is able to identify all tabular structures from a tabular page, which leads to a considerable simplification of document image segmentation in a novel manner. Such unification also results in speeding up the segmentation process, because the existing methodologies produce time-consuming solutions for treating different tabular structures as separate physical entities. Distinguishing features of different kinds of tabular structures have been used in stages in order to ensure the simplicity and efficiency of the algorithm and demonstrated by exhaustive experimental results.

Related Content

Hrithik Raj, Ritu Punhani, Ishika Punhani. © 2023. 31 pages.
Divi Anand, Isha Kaushik, Jasmehar Singh Mann, Ritu Punhani, Ishika Punhani. © 2023. 21 pages.
Jayanthi G., Purushothaman R.. © 2023. 10 pages.
Anshika Gupta, Shuchi Sirpal. © 2023. 14 pages.
Reet Kaur Kohli, Seneha Santoshi, Sunishtha S. Yadav, Vandana Chauhan. © 2023. 13 pages.
Poonam Tanwar. © 2023. 14 pages.
Monika Mehta, Shivani Mishra, Santosh Kumar, Muskaan Bansal. © 2023. 16 pages.
Body Bottom