Librarian: A quality control tool to analyse sequencing library compositions-Reference-Cited by-同舟云学术

Librarian: A quality control tool to analyse sequencing library compositions

Published:2024-01-24 Issue: Volume:11 Page:1122
ISSN:2046-1402
Container-title:F1000Research
language:en
Short-container-title:F1000Res

Author:

Vashishtha Kartavya^ORCID,Gaud Caroline^ORCID,Andrews Simon^ORCID,Krueger Christel^ORCID

Abstract

Background Robust analysis of DNA sequencing data needs to include a set of quality control steps to ensure that technical bias is kept to a minimum. A metric easily obtained is the frequency of each of the nucleobases for each position across all sequencing reads. Here, we explore the differences in nucleobase compositions of various library types produced by standard experimental methodologies. Methods We obtained the compositions of nearly 3000 publicly available datasets and subjected them to Uniform Manifold Approximation and Projection (UMAP) dimensionality reduction for a two-dimensional representation of their composition characteristics. Results We find that most library types result in a specific composition profile. We use this to give an estimate of how strongly the composition of a test library resembles the profiles of previously published libraries, and how likely the test sample is to be of a particular type. We introduce Librarian, a user-friendly web application and command line tool which enables checking base compositions of test libraries against known library types. Conclusions Library preparation methods strongly influence the per position nucleobase content. By comparing test libraries to a database of previously published library types we can make predictions regarding the library preparation method. Librarian is a user-friendly tool to access this information for quality assurance purposes as discrepancies can flag potential irregularities very early on.

Funder

Biotechnology and Biological Sciences Research Council

Publisher

F1000 Research Ltd

Link

https://f1000research.com/articles/11-1122/v2/pdf

Reference17 articles.

1. Sequencing|Key methods and uses.

2. RSeQC: quality control of RNA-seq experiments.;L Wang;Bioinformatics.,2012

3. Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data.;K Okonechnikov;Bioinformatics.,2016

4. Babraham Bioinformatics - FastQC A Quality Control tool for High Throughput Sequence Data.

5. Multi-genome alignment for quality control and contamination screening of next-generation sequencing data.;J Hadfield;Front. Genet.,2014

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An HERV-WENVtranscription in atypical memory B cells linked to COVID-19 evolution and risk for long COVID can express the encoded protein from a ribosome readthrough of mRNA from chromosome X;2024-07-03