Leveraging Genomics and Transcriptomics for Gene Discovery in Dry Pea
Abstract
Over the past two decades, there has been a significant increase in the utilization of DNA marker-based mapping studies to genetically map and further improve complex quantitative traits. A major caveat of this approach is that genetic mapping of the underlying genes conferring target phenotypes is challenging often due to the extent of long-range linkage disequilibrium (LD) in the genome, particularly in self-pollinated crops. Recent technologies allow us to examine expression-phenotype associations using transcriptome-wide association studies (TWAS) which is independently affected by LD, unlike in the case of genetic markers. This is of greatest utility in species where linkage disequilibrium is extensive such as dry pea, where genes can be prioritized for association with a trait because their expression patterns are independent. The goal of this study is to use gene expression collected from the developing pods of pea and the TWAS approach for mapping and prioritizing likely causal genes underlying seed protein content and yield. As the effective population size (Ne) of the USDA (United States Department of Agriculture) diversity panel provided substantial genetic variation, we utilized 300 USDA pea lines from within the collection and performed a comprehensive single-tissue, multi-environment TWAS across six diverse environments (2 years * 2 locations) in the major pea growing regions in the USA. As we compared the results of TWAS with genome-wide association studies (GWAS), we detected more common and unique set of strongly associated genes. In all TWAS models, the significant genes exhibited clear differentiation, unlike in the case of GWAS. A joint analysis of GWAS and TWAS results using the fisher’s combined test (FCT) increased the power of detecting more trait-associated genes including RGB. Using GWAS, TWAS and FCT models, we detected 45 genes for protein, 60 genes for yield, and 20 genes that were common to both traits. These results highlight the complex interaction between genetic factors and environmental influences in shaping the genetic architecture of seed yield and protein. Our study proved that multi-omics strategy increases the gene mapping resolution by surpassing the GWAS and/or TWAS approach, and highlights the potential phenotypic consequences of regulatory variation in dry pea.