MasakhaNER: Named Entity Recognition for African Languages

Author:

Adelani David Ifeoluwa12,Abbott Jade32,Neubig Graham4,D’souza Daniel52,Kreutzer Julia62,Lignos Constantine72,Palen-Michel Chester72,Buzaaba Happy82,Rijhwani Shruti4,Ruder Sebastian9,Mayhew Stephen10,Azime Israel Abebe112,Muhammad Shamsuddeen H.12132,Emezue Chris Chinenye142,Nakatumba-Nabende Joyce152,Ogayo Perez162,Anuoluwapo Aremu172,Gitau Catherine2,Mbaye Derguene2,Alabi Jesujoba182,Yimam Seid Muhie19,Gwadabe Tajuddeen Rabiu202,Ezeani Ignatius212,Niyongabo Rubungo Andre222,Mukiibi Jonathan15,Otiende Verrah232,Orife Iroro242,David Davis2,Ngom Samba2,Adewumi Tosin252,Rayson Paul21,Adeyemi Mofetoluwa2,Muriuki Gerald15,Anebi Emmanuel2,Chukwuneke Chiamaka21,Odu Nkiruka26,Wairagala Eric Peter15,Oyerinde Samuel2,Siro Clemencia2,Bateesa Tobius Saul15,Oloyede Temilola2,Wambui Yvonne2,Akinode Victor2,Nabagereka Deborah15,Katusiime Maurice15,Awokoya Ayodele272,MBOUP Mouhamadane2,Gebreyohannes Dibora2,Tilaye Henok2,Nwaike Kelechi2,Wolde Degaga2,Faye Abdoulaye2,Sibanda Blessing282,Ahia Orevaoghene292,Dossou Bonaventure F. P.302,Ogueji Kelechi312,DIOP Thierno Ibrahima2,Diallo Abdoulaye2,Akinfaderin Adewale2,Marengereke Tendai2,Osei Salomey112

Affiliation:

1. Spoken Language Systems Group (LSV), Saarland University, Germany

2. Masakhane NLP

3. Retro Rabbit, South Africa

4. Language Technologies Institute, Carnegie Mellon University, United States

5. ProQuest, United States

6. Google Research, Canada

7. Brandeis University, United States

8. Graduate School of Systems and Information Engineering, University of Tsukuba, Japan

9. DeepMind, United Kingdom

10. Duolingo, United States

11. African Institute for Mathematical Sciences (AIMS-AMMI), Ethiopia

12. University of Porto, Nigeria

13. Bayero University, Kano, Nigeria

14. Technical University of Munich, Germany

15. Makerere University, Kampala, Uganda

16. African Leadership University, Rwanda

17. University of Lagos, Nigeria

18. Max Planck Institute for Informatics, Germany

19. LT Group, Universität Hamburg, Germany

20. University of Chinese Academy of Science, China

21. Lancaster University, United Kingdom

22. University of Electronic Science and Technology of China, China

23. United States International University - Africa (USIU-A), Kenya

24. Niger-Volta LTI

25. Luleo University of Technology, Sweden

26. African University of Science and Technology, Abuja, Nigeria

27. University of Ibadan, Nigeria

28. Namibia University of Science and Technology, Namibia

29. Instadeep, Nigeria

30. Jacobs University Bremen, Germany

31. University of Waterloo, Canada

Abstract

Abstract We take a step towards addressing the under- representation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten African languages. We detail the characteristics of these languages to help researchers and practitioners better understand the challenges they pose for NER tasks. We analyze our datasets and conduct an extensive empirical evaluation of state- of-the-art methods across both supervised and transfer learning settings. Finally, we release the data, code, and models to inspire future research on African NLP.1

Publisher

MIT Press - Journals

Subject

Artificial Intelligence,Computer Science Applications,Linguistics and Language,Human-Computer Interaction,Communication

Reference56 articles.

1. MENYO-20k: A multi-domain english-yorùbá corpus for machine translation and domain adaptation;Adelani;ArXiv,2021

2. JW300: A wide-coverage parallel corpus for low-resource languages;Agić,2019

3. Massive vs. curated embeddings for low-resourced languages: The case of Yorùbá and Twi;Alabi,2020

4. NoSta-D named entity annotation for German: Guidelines and dataset;Benikova,2014

5. Enriching word vectors with subword information;Bojanowski;Transactions of the Association for Computational Linguistics,2017

Cited by 22 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3