Mining Structure Patterns Based on 3D Features in the Protein-DNA and Protein-protein Complex
View/ Open
Abstract
For a long time, researchers have been searching for the “recognition codes” of protein complexes, which determine what DNA sequence or other protein a protein can bind to. The binding part prediction of protein complexes have important applications in many biological research fields, such as cleavage enzyme design and drug design and so on. Most sequence-based and PWM methods only capture sequence features on the protein interfaces and ignore the crucial spatial attributes of the features. This study investigates the recognition codes from a new angle, in which the preferred binding modes are captured using local structural motifs spanning the protein-DNA, protein-protein and protein-ligand interfaces. Using product graph, we transformed the structural motif discovery problem into a search for maximal cliques. These motifs include more information than the traditional amino acid-base contacting pairs. For example, in the protein-DNA interfaces research, we studied two domains, Zinc-finger and Helix-Turn-Helix (HTH), that both used a recognition helix to interact with DNA. In each domain, we found a few frequent structural motifs spanning the protein-DNA interfaces. Each motif includes at least 2 amino acids and 1 nucleotide from both sides of the interfaces. The motifs specify not only the types of amino acids and nucleotides involved in the interaction, but also the distances between them and their relative orientation. The same method has been implemented in protein-protein and protein-ligand complexes. These motifs reveal preferred binding modes at the interfaces that involve more entities than the traditional contacting pairs. The biological and statistical significance of the motifs were confirmed using evolutionary conservation analysis and bootstrapping We also performed many other tests to evaluate our motifs’ critical roles in the interactions. For example, we compared our motifs with experimentally verified hotspots. We also compared our method with other computational prediction methods to assess the effectiveness of the method. Our results confirmed that the graph motifs discovered in this study play important roles in protein-DNA, protein-protein and protein-ligand interactions. We believe that the proposed graph method will be a very helpful tool for studying protein complexes interaction and other types of molecular interactions.