2015 NDSU Bean Breeding Program Genotyping Snapshot

Simons, Kristin J.; McClean, Phillip E.; Osorno, Juan M.; Oladzad, Atena; Pasche, Julie S.; Lamppa, Robin

Author/Creator

Simons, Kristin J.

McClean, Phillip E.

Osorno, Juan M.

Oladzad, Atena

Pasche, Julie S.

Lamppa, Robin

More Information

Show full item record

View/Open

MiddleAmericanCBBPhenotype.txt (11.71Kb)

AndeanCBBPhenotype.txt (2.233Kb)

MiddleAmericanImputedSNPDataset.hmp (87.63Mb)

AndeanImputedSNPDataset.hmp (13.52Mb)

Abstract

The dataset consists of genotyping and common bacterial blight phenotyping information from genotypes within the NDSU Dry Bean Breeding Program. The Middle American data set consists of 713 genotypes and the Andean dataset consists of 139 genotypes. Both Middle American and Andean lines were phenotyped with common bacterial blight at both the unifoliate and trifoliate stages and the medians recorded. DNA was isolated from each line and sequenced using a single-end Illumina platform. Sequences were quality trimmed using SICKLE and then aligned to the Phaseolus vulgaris v2.1 reference sequence (DOE-JGI and USDA-NIFA, http://phytozome.jgi.doe.gov), indexed and sorted using BWA-MEMB and SAMtools. Read groups including library ID, platform and platform unit were added to each alignment within the BAM files using Picard (http://broadinstitute.github.io/picard/). Unifiedgenotyper from GATK3.6 (DePristo et al. 2011) was used to call variants with quality scores above 10. Quality scores between 10 and 30 were marked as low quality. Variants with a read depth of less than two were filtered using GATK3.6 variantfiltration and subsequently replaced as missing data. Low quality variants were removed via hard filtering when variants contained more than 25% missing data (50% in the MA SNP data set), more than one nucleotide, more than two alleles, or the minor allele was less than 5% in the Andean dataset(<1% in the MA SNP dataset). Genotypes with more than 90% missing data were removed. SNPs with less than 25% in the Andean dataset (50% in MA SNP dataset) of missing data were imputed in fastPHASE. The output file was converted to a hmp file for distribution. The dataset was used for identifying genomic regions associated with resistance to common bacterial blight in dry beans and can be mined for other SNPs of interest.

These datasets are used in a study that can be found in the NDSU repository at https://hdl.handle.net/10365/32840

URI

https://hdl.handle.net/10365/31610

NDSU Repository