Polymorphisms predicting phylogeny in hepatitis B virus

Author:

Lourenço José1,McNaughton Anna L2,Pley Caitlin3,Obolski Uri45,Gupta Sunetra6,Matthews Philippa C78910

Affiliation:

1. BioISI (Biosystems and Integrative Sciences Institute), Faculty of Sciences, University of Lisbon , Campo Grande, Lisbon 1749-016, Portugal

2. Population Health Science, Bristol Medical School, University of Bristo , 5 Tyndall Ave, Bristol BS81UDBS8, UK

3. Guy’s and St Thomas’ NHS Foundation Trust , Westminster Bridge Rd, London SE1 7EH, UK

4. School of Public Health, Tel Aviv University , Tel Aviv 6997801, Israel

5. Porter School of the Environment and Earth Sciences, Tel Aviv University , Tel Aviv 6997801, Israel

6. Department of Zoology, University of Oxford, Medawar Building for Pathogen Research , South Parks Road, Oxford OX1 3SY, UK

7. The Francis Crick Institute , 1 Midland Road, London NW1 1AT, UK

8. Division of Infection and Immunity, University College London , Gower Street, London WC1E 6BT, UK

9. Department of Infectious Diseases, University College London Hospital , 250 Euston Road, London NW1 2PG, UK

10. Nuffield Department of Medicine, University of Oxford, Medawar Building for Pathogen Research , South Parks Road, Oxford OX1 3SY, UK

Abstract

Abstract Hepatitis B viruses (HBVs) are compact viruses with circular genomes of ∼3.2 kb in length. Four genes (HBx, Core, Surface, and Polymerase) generating seven products are encoded on overlapping reading frames. Ten HBV genotypes have been characterised (A–J), which may account for differences in transmission, outcomes of infection, and treatment response. However, HBV genotyping is rarely undertaken, and sequencing remains inaccessible in many settings. We set out to assess which amino acid (aa) sites in the HBV genome are most informative for determining genotype, using a machine learning approach based on random forest algorithms (RFA). We downloaded 5,496 genome-length HBV sequences from a public database, excluding recombinant sequences, regions with conserved indels, and genotypes I and J. Each gene was separately translated into aa, and the proteins concatenated into a single sequence (length 1,614 aa). Using RFA, we searched for aa sites predictive of genotype and assessed covariation among the sites with a mutual information–based method. We were able to discriminate confidently between genotypes A–H using ten aa sites. Half of these sites (5/10) sites were identified in Polymerase (Pol), of which 4/5 were in the spacer domain and one in reverse transcriptase. A further 4/10 sites were located in Surface protein and a single site in HBx. There were no informative sites in Core. Properties of the aa were generally not conserved between genotypes at informative sites. Among the highest co-varying pairs of sites, there were fifty-five pairs that included one of these ‘top ten’ sites. Overall, we have shown that RFA analysis is a powerful tool for identifying aa sites that predict the HBV lineage, with an unexpectedly high number of such sites in the spacer domain, which has conventionally been viewed as unimportant for structure or function. Our results improve ease of genotype prediction from limited regions of HBV sequences and may have future applications in understanding HBV evolution.

Funder

FCiências.ID

University College London Hospitals NIHR Biomedical Research Centre

National Institute for Health Research Research Capability Funding

Francis Crick Institute

Wellcome Trust

Publisher

Oxford University Press (OUP)

Subject

Virology,Microbiology

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3