CINF 19 |
| Current clustering techniques can be grouped as either supervised or unsupervised. In a supervised method, each observation in the training dataset is pre-assigned to a class based on prior knowledge, while an unsupervised method uses no prior knowledge of the class distinction. Numerous supervised techniques have been demonstrated to work well for binary classification and a few of these are reasonably good at making supervised multi-class predictions. However, techniques for unsupervised binary and multi-class predictions have not been fully developed. In this work, we present an analysis technique based on hierarchical K-means using differentially weighted principal component analysis to address unsupervised classification for both binary and multi-class problems. We demonstrate the methodology on both biological (NCI 60 cancer cell lines dataset and acute leukemia dataset) as well as chemical datasets with the objectives of predicting class membership and identifying non-redundant features most responsible for differentiating the observed classes. |
|
Advances in Data-mining and Analysis: Informatics Perspective
8:00 AM-11:00 AM, Monday, 29 August 2005 Washington DC Convention Center -- 151B, Oral
Division of Chemical Information |