Search Results

Now showing 1 - 3 of 3
  • Item
    A New Structural Feature for Lysine Post-Translation Modification Prediction Using Machine Learning
    (North Dakota State University, 2021) Liu, Yuan
    Lysine post-translational modification (PTM) plays a vital role in modulating multiple biological processes and functions. Lab-based lysine PTM identification is laborious and time-consuming, which impede large-scale screening. Many computational tools have been proposed to facilitate PTM identification in silico using sequence-based protein features. Protein structure is another crucial aspect of protein that should not be neglected. To our best knowledge, there is no structural feature dedicated to PTM identification. We proposed a novel spatial feature that captures rich structure information in a succinct form. The dimension of this feature is much lower than that of other sequence and structural features that were used in previous studies. When the proposed feature was used to predict lysine malonylation sites, it achieved performance comparable to other state-of-the-art methods that had much higher dimension. The low dimensionality of the proposed feature would be very helpful for building interpretable predictors for various applications involving protein structures. We further attempted to develop a reliable benchmark dataset and evaluate performance of multiple sequence- and structure-based features in prediction. The result indicated that our proposed spatial structure achieved competent performance and that other structural features can also make contribution to PTM prediction. Even though utilizing protein structure in lysine PTM prediction is still in the early stage, we can expect structure-based features to play a more crucial role in PTM site prediction.
  • Item
    Optimizing Prediction Power of RNA-seq on Intrinsic Characteristics in Breast Cancer
    (North Dakota State University, 2022) Liu, Yuan
    Breast cancer is the most common cancer in women worldwide, and accurate and early detection of breast cancer is vital in characterizing the disease. Transcriptomic expression is embedded abundant tumor and cell state information. However, selecting a good pipeline in applying mRNA expression is critical in downstream characteristics prediction. We designed a study that focused on determining the best combinations of preprocessing processes in predictions. We tested six normalization methods, two gene selection methods, and over ten machine learning algorithms. By using appropriate evaluation metrics, we recommend using FPKM normalization method combined with either gene selection method and employing RF for the purpose of breast cancer downstream prediction.
  • Item
    Genetic Dissection of Tan Spot Resistance in Wheat
    (North Dakota State University, 2020) Liu, Yuan
    Tan spot, caused by the necrotrophic fungal pathogen Pyrenophora tritici-repentis (Ptr), is a major foliar disease in wheat. QTL mapping and meta-QTL analysis are effective methods to understand genetic basis of tan spot resistance, which can further facilitate resistant variety development. A number of QTL mapping studies have been conducted in hexaploid bread wheat whereas few mapping studies have been carried out in tetraploid wheat. Four interconnected tetraploid wheat mapping populations were evaluated for resistance to race 2 isolate 86-124. Twelve QTL were identified in three of the four mapping populations. To further extend understanding of tan spot resistance, meta-QTL analysis was conducted by using reported QTL from 14 previous QTL mapping studies. Three meta-QTL located on chromosomes 2A, 3B, and 5A showed large genetic effects in multiple populations and conferred resistance to multiple races. Integrating those race-nonspecific QTL could provide high and stable tan spot resistance in wheat.