A New Structural Feature for Lysine Post-Translation Modification Prediction Using Machine Learning
Abstract
Lysine post-translational modification (PTM) plays a vital role in modulating multiple biological processes and functions. Lab-based lysine PTM identification is laborious and time-consuming, which impede large-scale screening. Many computational tools have been proposed to facilitate PTM identification in silico using sequence-based protein features. Protein structure is another crucial aspect of protein that should not be neglected. To our best knowledge, there is no structural feature dedicated to PTM identification. We proposed a novel spatial feature that captures rich structure information in a succinct form. The dimension of this feature is much lower than that of other sequence and structural features that were used in previous studies. When the proposed feature was used to predict lysine malonylation sites, it achieved performance comparable to other state-of-the-art methods that had much higher dimension. The low dimensionality of the proposed feature would be very helpful for building interpretable predictors for various applications involving protein structures. We further attempted to develop a reliable benchmark dataset and evaluate performance of multiple sequence- and structure-based features in prediction. The result indicated that our proposed spatial structure achieved competent performance and that other structural features can also make contribution to PTM prediction. Even though utilizing protein structure in lysine PTM prediction is still in the early stage, we can expect structure-based features to play a more crucial role in PTM site prediction.