The IRMA Community
Newsletters
Research IRM
Click a keyword to search titles using our InfoSci-OnDemand powered search:
|
Using the Text Categorization Framework for Protein Classification
Abstract
In this chapter, we are interested in proteins classification starting from their primary structures. The goal is to automatically affect proteins sequences to their families. The main originality of the approach is that we directly apply the text categorization framework for the protein classification with very minor modifications. The main steps of the task are clearly identified: we must extract features from the unstructured dataset, we use the fixed length n-grams descriptors; we select and combine the most relevant one for the learning phase; and then, we select the most promising learning algorithm in order to produce accurate predictive model. We obtain essentially two main results. First, the approach is credible, giving accurate results with only 2-grams descriptors length. Second, in our context where many irrelevant descriptors are automatically generated, we must combine aggressive feature selection algorithms and low variance classifiers such as SVM (Support Vector Machine).
Related Content
.
© 2023.
34 pages.
|
.
© 2023.
15 pages.
|
.
© 2023.
15 pages.
|
.
© 2023.
18 pages.
|
.
© 2023.
24 pages.
|
.
© 2023.
32 pages.
|
.
© 2023.
21 pages.
|
|
|