The data displayed in this applet is derived from 79 lines of soybean next-generation sequencing data. Twenty seeds from each line were acquired from ---. Seeds were planted in the USDA greenhouse at Iowa State University. Once plants reached the trifoliolate stage, leaves from up to 10 plants were pooled and genomic DNA was extracted. DNA was sent to Hudson Alpha Institute for Biotechnology for next-generation sequencing. In addition, replicated field trials were conducted on a subset of lines (30 of the 79 lines, plus ancestral varieties that were not sequenced) to measure protein, oil, yield, and other characteristics under standard growth conditions, to dissociate the effect of on-farm improvements from genetic gain ,.
MarkDuplicatesfunctions in picard tools.
IndelRealignerfunction in GATK . The
ReduceReadsfunction was used to compress the alignment files by removing non-informative and redundant reads (default parameters except for downsample_coverage=1).
Using BAM files from the previous step as input to cn.mops, the program was executed separately on each geneomic feature (gene, exon, CDS, mRNA) to provide internal verification as well as reduce the problem to a more computationally manageable size. As suggested in the cn.mops manual, each region of the genome was extended by 30 bp on each side to aid in identification of CNV regions.
After running cn.mops, results were assembled and merged with annotation files.
The results from the algorithm were back-transformed (regions were reduced by 30 bp on each side) and merged with annotation files.
Plots in this applet were generated using ggplot2 , and are rendered interactively using Shiny .
1. Specht JE, Williams JH. Contribution of genetic technology to soybean productivity—Retrospect and prospect. Genetic contributions to yield gains of five major crop plants. Crop Science Society of America; American Society of Agronomy; 1984;49–74.
2. Fox CM, Cary TR, Colgrove AL, Nafziger ED, Haudenshield JS, Hartman GL, Specht JE, Diers BW. Estimating soybean genetic gain for yield in the northern united states—Influence of cropping history. Crop Science. The Crop Science Society of America, Inc. 2013;53:2473–2482.
3. Wu TD, Nacu S. Fast and snp-tolerant detection of complex variants and splicing in short reads. Bioinformatics. Oxford Univ Press; 2010;26:873–881.
4. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, others. The sequence alignment/map format and samtools. Bioinformatics. Oxford Univ Press; 2009;25:2078–2079.
5. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, others. The genome analysis toolkit: A mapreduce framework for analyzing next-generation dna sequencing data. Genome research. Cold Spring Harbor Lab; 2010;20:1297–1303.
6. Wickham H. Ggplot2: Elegant graphics for data analysis [Internet]. Springer New York; 2009. Available from: http://had.co.nz/ggplot2/book.
7. RStudio, Inc. Shiny: Web application framework for r [Internet]. 2014. Available from: http://www.rstudio.com/shiny/.