NanoBERT: An Extremely Compact Language Model

Author:

Maity Krishanu1ORCID,Chaulwar Amit Tulsidas1ORCID,Vala Vanraj1ORCID,Guntur Ravi Sankar1ORCID

Affiliation:

1. Samsung R&D Bangalore, India

Publisher

ACM

Reference36 articles.

1. Amit Chaulwar , Lukas Malik , Maciej Krajewski , Felix Reichel , Leif-Nissen Lundbæk , Michael Huth , and Bartlomiej Matejczyk . 2022. Extreme compression of sentence-transformer ranker models: faster inference, longer battery life, and less storage on edge devices. ArXiv abs/2207.12852 ( 2022 ). Amit Chaulwar, Lukas Malik, Maciej Krajewski, Felix Reichel, Leif-Nissen Lundbæk, Michael Huth, and Bartlomiej Matejczyk. 2022. Extreme compression of sentence-transformer ranker models: faster inference, longer battery life, and less storage on edge devices. ArXiv abs/2207.12852 (2022).

2. Gabrielle Cohn , Rishika Agarwal , Deepanshu Gupta , and Siddharth Patwardhan . 2023 . EELBERT: Tiny Models through Dynamic Embeddings. In EMNLP. https://arxiv.org/abs/2310.20144 Gabrielle Cohn, Rishika Agarwal, Deepanshu Gupta, and Siddharth Patwardhan. 2023. EELBERT: Tiny Models through Dynamic Embeddings. In EMNLP. https://arxiv.org/abs/2310.20144

3. Ona De Gibert , Naiara Perez , Aitor García-Pablos , and Montse Cuadros . 2018. Hate speech dataset from a white supremacy forum. arXiv preprint arXiv:1809.04444 ( 2018 ). Ona De Gibert, Naiara Perez, Aitor García-Pablos, and Montse Cuadros. 2018. Hate speech dataset from a white supremacy forum. arXiv preprint arXiv:1809.04444 (2018).

4. Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018). Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

5. Ali Edalati , Marzieh  S. Tahaei , Ahmad Rashid , Vahid Partovi Nia , James  J. Clark , and Mehdi Rezagholizadeh . 2021. Kronecker Decomposition for GPT Compression. CoRR abs/2110.08152 ( 2021 ). arXiv:2110.08152https://arxiv.org/abs/2110.08152 Ali Edalati, Marzieh S. Tahaei, Ahmad Rashid, Vahid Partovi Nia, James J. Clark, and Mehdi Rezagholizadeh. 2021. Kronecker Decomposition for GPT Compression. CoRR abs/2110.08152 (2021). arXiv:2110.08152https://arxiv.org/abs/2110.08152

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3