IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Text Mining Methods for Hierarchical Document Indexing

Text Mining Methods for Hierarchical Document Indexing
View Sample PDF
Author(s): Han-Joon Kim (The University of Seoul, Korea)
Copyright: 2005
Pages: 7
Source title: Encyclopedia of Data Warehousing and Mining
Source Author(s)/Editor(s): John Wang (Montclair State University, USA)
DOI: 10.4018/978-1-59140-557-3.ch209

Purchase

View Text Mining Methods for Hierarchical Document Indexing on the publisher's website for pricing and purchasing information.

Abstract

We have recently seen a tremendous growth in the volume of online text documents from networked resources such as the Internet, digital libraries, and company-wide intranets. One of the most common and successful methods of organizing such huge amounts of documents is to hierarchically categorize documents according to topic (Agrawal, Bayardo, & Srikant, 2000; Kim & Lee, 2003). The documents indexed according to a hierarchical structure (termed ‘topic hierarchy’ or ‘taxonomy’) are kept in internal categories as well as in leaf categories, in the sense that documents at a lower category have increasing specificity. Through the use of a topic hierarchy, users can quickly navigate to any portion of a document collection without being overwhelmed by a large document space. As is evident from the popularity of Web directories such as Yahoo (http://www.yahoo.com/) and Open Directory Project (http://dmoz.org/), topic hierarchies have increased in importance as a tool for organizing or browsing a large volume of electronic text documents.

Related Content

Md Sakir Ahmed, Abhijit Bora. © 2024. 15 pages.
Lakshmi Haritha Medida, Kumar. © 2024. 18 pages.
Gypsy Nandi, Yadika Prasad. © 2024. 16 pages.
Saurav Bhattacharjee, Sabiha Raiyesha. © 2024. 14 pages.
Naren Kathirvel, Kathirvel Ayyaswamy, B. Santhoshi. © 2024. 26 pages.
K. Sudha, C. Balakrishnan, T. P. Anish, T. Nithya, B. Yamini, R. Siva Subramanian, M. Nalini. © 2024. 25 pages.
Sabiha Raiyesha, Papul Changmai. © 2024. 28 pages.
Body Bottom