Author:
Sithabisiwe Manokore Anita,Gondo Monica
Abstract
The rise of digital information in many languages, including Shona, highlights the significance of developing effective text summarizing techniques to promote information accessibility and usability. This work fills a major gap in the natural language processing (NLP) for the Shona language, which is widely spoken in Zimbabwe and its surrounding areas but has received little attention. The lack of pre-trained language models specifically designed for Shona, the intricacy of Shona's morphology, and the scarcity of annotated datasets provide the main obstacles to Shona text summarization.[1] The goal of this research is to create and modify contemporary machine learning methods for efficient Shona text summarizing in order to address these issues. By gathering and analyzing texts from a variety of sources, such as news stories, scholarly papers, and social media, we produced large annotated corpora. These datasets were utilized to fine-tune existing NLP models, such as Transformer-based architectures, ensuring they account for Shona’s specific language traits. To address the morphological and syntactic complexities of Shona, our solution combines statistical and rule-based techniques. When compared to baseline methods, the results show a significant improvement in the relevancy and accuracy of Shona text summaries. In addition to serving as a starting point for further NLP research in underrepresented languages, the generated models help Shona-speaking people in the areas of business, education, and media. By encouraging inclusivity and linguistic variety, showcasing the possibility for cross- lingual breakthroughs, and emphasizing the ethical development of technology, this research adds to the larger area of NLP.
Publisher
International Journal of Innovative Science and Research Technology
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献