Applications of Machine Learning for Linguistic Analysis of Texts

View Sample PDF

Author(s): Rosemary Torney (University of Ballarat, Australia), John Yearwood (Federation University, Australia), Peter Vamplew (University of Ballarat, Australia)and Andrei V. Kelarev (University of Ballarat, Australia)
Copyright: 2012
Pages: 16
Source title: Machine Learning Algorithms for Problem Solving in Computational Applications: Intelligent Techniques
Source Author(s)/Editor(s): Siddhivinayak Kulkarni (University of Ballarat, Australia)
DOI: 10.4018/978-1-4666-1833-6.ch008

Keywords: Artificial Intelligence / Computational Intelligence / Computer Science & IT / Information Science Reference

Purchase

View Applications of Machine Learning for Linguistic Analysis of Texts on the publisher's website for pricing and purchasing information.

Abstract

This chapter describes a novel multistage method for linguistic clustering of large collections of texts available on the Internet as a precursor to linguistic analysis of these texts. This method addresses the practicalities of applying clustering operations to a very large set of text documents by using a combination of unsupervised clustering and supervised classification. The method relies on creating a multitude of independent clusterings of a randomized sample selected from the International Corpus of Learner English. Several consensus functions and sophisticated algorithms are applied in two substages to combine these independent clusterings into one final consensus clustering, which is then used to train fast classifiers in order to enable them to perform the profiling of very large collections of text and web data. This approach makes it possible to apply advanced highly accurate and sophisticated clustering techniques by combining them with fast supervised classification algorithms. For the effectiveness of this multistage method it is crucial to determine how well the supervised classification algorithms are going to perform at the final stage, when they are used to process large data sets available on the Internet. This performance may also serve as an indication of the quality of the combined consensus clustering obtained in the preceding stages. The authors’ experimental results compare the performance of several classification algorithms incorporated in this multistage scheme and demonstrate that several of these classification algorithms achieve very high precision and recall and can be used in practical implementations of their method.

The IRMA Community

Research IRM

Applications of Machine Learning for Linguistic Analysis of Texts

Purchase

Abstract

Related Content

IRMA Sponsors