CIS Seminar: Sheng Liu

"Large Margin Classifiers and Random Forests for Integrated Biological Prediction on Mixed Type Data"
View
Wednesday, 6 October, 3:00 p.m.
235 Weir Hall

Sheng Liu, MS student
Computer and Information Science

This is a preview of a talk Liu will give at the 7th Annual Biotechnology and Bioinformatics Symposium (BIOT 2010).

Abstract: Incorporating various sources of biological information is important for biological discovery. For one example, genes have a multi-view representation. They can be represented by features such as sequence length and physical-chemical properties. They can be also represented by pairwise similarities, gene expression levels, and phylogenetics position. Hence, the types vary from numerical features to categorical features. An efficient way of learning from observations with a multi-view representation of mixed type of data is thus important.

We propose a large margin random forest classification approach based on random forest proximity. Random forests accommodate mixed data types naturally. Large margin classifiers are obtained from the random forests proximity kernel or its derivative kernels. We tested the approach on four biological datasets. The performance is promising compared with other state of the art methods including support vector machines (SVMs) and Random Forest classifiers. It demonstrates high potential in the discovery of functional roles of genes and proteins. We also examine the effects of mixed type of data on the algorithms used.

Bio: Sheng Liu received his Bachelor of Science in Biochemistry from Wuhan University. He is currrently an MS student in Computer and Science at University of Mississippi. His research interests include bioinformatics, computational biology, and machine learning. Liu is a member the data provenance research group led by Dr. Dawn Wilkins and Dr. Yixin Chen.