Conditional Random Field with Lasso and its Application to the Classification of Barley Genes Based on Expression Level Affected by Fungal Infection
View/ Open
Abstract
The classification problem of gene expression level, more specifically, gene expression analysis, is a major research area in statistics. There are several classical methods to solve the classification problem. To apply Logistic Regression Model (LRM) and other classical methods, the observations in the dataset should fit the assumption of independence. That is, the observations in the dataset are independent to each other, and the predictor (independent variable) should be independent. These assumptions are usually violated in gene expression analysis. Although the Classical Hidden Markov Chain Model (HMM) can solve the independence of observation problem, the classical HMM requires the independent variables in the dataset are discrete and independent. Unfortunately, the gene expression level is a continuous variable. To solve the classification problem of Gene Expression Level data, the Conditional Random Field(CRF) is introduce. Finally, the Least Absolute Selection and Shrinkage Operator (LASSO) penalty, a dimensional reduction method, is introduced to improve the CRF model.