The effect of strand bias in Illumina short-read sequencing data-Reference-Cited by-同舟云学术

The effect of strand bias in Illumina short-read sequencing data

Published:2012-11-24 Issue:1 Volume:13 Page:
ISSN:1471-2164
Container-title:BMC Genomics
language:en
Short-container-title:BMC Genomics

Author:

Guo Yan,Li Jiang,Li Chung-I,Long Jirong,Samuels David C,Shyr Yu

Abstract

Abstract Background When using Illumina high throughput short read data, sometimes the genotype inferred from the positive strand and negative strand are significantly different, with one homozygous and the other heterozygous. This phenomenon is known as strand bias. In this study, we used Illumina short-read sequencing data to evaluate the effect of strand bias on genotyping quality, and to explore the possible causes of strand bias. Result We collected 22 breast cancer samples from 22 patients and sequenced their exome using the Illumina GAIIx machine. By comparing the consistency between the genotypes inferred from this sequencing data with the genotypes inferred from SNP chip data, we found that, when using sequencing data, SNPs with extreme strand bias did not have significantly lower consistency rates compared to SNPs with low or no strand bias. However, this result may be limited by the small subset of SNPs present in both the exome sequencing and the SNP chip data. We further compared the transition and transversion ratio and the number of novel non-synonymous SNPs between the SNPs with low or no strand bias and those with extreme strand bias, and found that SNPs with low or no strand bias have better overall quality. We also discovered that the strand bias occurs randomly at genomic positions across these samples, and observed no consistent pattern of strand bias location across samples. By comparing results from two different aligners, BWA and Bowtie, we found very consistent strand bias patterns. Thus strand bias is unlikely to be caused by alignment artifacts. We successfully replicated our results using two additional independent datasets with different capturing methods and Illumina sequencers. Conclusion Extreme strand bias indicates a potential high false-positive rate for SNPs.

Publisher

Springer Science and Business Media LLC

Subject

Genetics,Biotechnology

Link

https://link.springer.com/content/pdf/10.1186/1471-2164-13-666.pdf

Reference14 articles.

1. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, et al: The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010, 20 (9): 1297-1303. 10.1101/gr.107524.110.

2. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009, 25 (16): 2078-2079. 10.1093/bioinformatics/btp352.

3. Guo Y, Long J, He J, Li CI, Cai Q, Shu XO, Zheng W, Li C: Exome sequencing generates high quality data in non-target regions. BMC Genomics. 2012, 13 (1): 194-10.1186/1471-2164-13-194.

4. Zheng W, Long J, Gao YT, Li C, Zheng Y, Xiang YB, Wen W, Levy S, Deming SL, Haines JL, et al: Genome-wide association study identifies a new breast cancer susceptibility locus at 6q25.1. Nat Genet. 2009, 41 (3): 324-328. 10.1038/ng.318.

5. Zheng W, Chow WH, Yang G, Jin F, Rothman N, Blair A, Li HL, Wen W, Ji BT, Li Q, et al: The Shanghai Women's Health Study: rationale, study design, and baseline characteristics. Am J Epidemiol. 2005, 162 (11): 1123-1131. 10.1093/aje/kwi322.

Cited by 98 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Genomic reproducibility in the bioinformatics era;Genome Biology;2024-08-09

2. CD59 gene: 143 haplotypes of 22,718 nucleotides length by computational phasing in 113 individuals from different ethnicities;Transfusion;2024-05-30

3. Crykey: Rapid identification of SARS-CoV-2 cryptic mutations in wastewater;Nature Communications;2024-05-28

4. Analysis of somatic mutations in whole blood from 200,618 individuals identifies pervasive positive selection and novel drivers of clonal hematopoiesis;Nature Genetics;2024-05-14

5. High throughput AS LNA qPCR method for the detection of a specific mutation in poliovirus vaccine strains;Vaccine;2024-04