IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Improve the Query Hit List Precision by Documents Clustering Technique

Improve the Query Hit List Precision by Documents Clustering Technique
View Free PDF
Author(s): Ciya Liao (Oracle Corporation, USA), Shamim Alpha (Oracle Corporation, USA) and Paul Dixon (Oracle Corporation, USA)
Copyright: 2004
Pages: 4
Source title: Innovations Through Information Technology
Source Editor(s): Mehdi Khosrow-Pour, D.B.A. (Information Resources Management Association, USA)
DOI: 10.4018/978-1-59140-261-9.ch050
ISBN13: 9781616921255
EISBN13: 9781466665347

Abstract

We propose a new approach to improve query hit list precision in document information retrieval. We use the k-mean clustering technique to group returned hit list documents. The relevancy of each cluster is evaluated according to document relevancy scores in the clusters. The final relevancy score of each document is a combination of the relevancy score of cluster and individual document. To form clusters with features more related to the query, we use pseudo-feedback documents to construct a latent semantic index (LSI), which transforms all the documents in the hit list into LSI feature vectors. Feature vectors constructed with relevant features are input to the clustering algorithm. We show that LSI based on relevant documents can improve the hit list cluster coherence significantly, in the sense that clusters group query relevant and irrelevant documents separately. We also show that the improved cluster quality, which results to better separation between relevant and irrelevant documents, can be used to improve the precision of a query hit list significantly.

Body Bottom