Deriving Document Keyphrases for Text Mining

View Sample PDF

Author(s): Yi-fang Brook Wu (New Jersey Institute of Technology, USA)and Quanzhi Li (Avaya, Inc., USA)
Copyright: 2009
Pages: 14
Source title: Handbook of Research on Text and Web Mining Technologies
Source Author(s)/Editor(s): Min Song (New Jersey Institute of Technology, USA)and Yi-Fang Brook Wu (New Jersey Institute of Technology, USA)
DOI: 10.4018/978-1-59904-990-8.ch002

Keywords: Data Mining / Data Mining and Databases / Information Science Reference / Library & Information Science

Purchase

View Deriving Document Keyphrases for Text Mining on the publisher's website for pricing and purchasing information.

Abstract

Document keyphrases provide semantic metadata which can characterize documents and produce an overview of the content of a document. This chapter describes a Keyphrase Identification Program (KIP), which extracts document keyphrases by using prior positive samples of human identified domain keyphrases to assign weights to the candidate keyphrases. The logic of our algorithm is: the more keywords a candidate keyphrase contains and the more significant these keywords are, the more likely this candidate phrase is a keyphrase. To obtain human identified positive inputs, KIP first populates its glossary database using manually identified keyphrases and keywords. It then checks the composition of all noun phrases extracted from a document, looks up the database and calculates scores for all these noun phrases. The ones having higher scores will be extracted as keyphrases. KIP’s learning function can enrich the glossary database by automatically adding new identified keyphrases to the database.

The IRMA Community

Research IRM

Deriving Document Keyphrases for Text Mining

Purchase

Abstract

Related Content

IRMA Sponsors