Author:
Shruti A. Gadewar ,Prof. P. H. Pawar
Abstract
With the rapid expansion of the internet, there has been an exponential surge in data volume, encompassing a myriad of documents laden with diverse types of information. This vast expanse includes structured and unstructured data, ranging from big data sets to formatted text and unformatted content. However, this abundance of unstructured data poses significant challenges in terms of effective management. Manual classification of this burgeoning data landscape is impractical, necessitating automated solutions. In this paper, we propose leveraging advanced machine learning techniques, particularly the BERT model, to classify documents based on contextual understanding, offering a more efficient and accurate approach to handling the data deluge.
Reference12 articles.
1. Ilkay Yelmen, Ali Gunes, and Metin Zontul on “Multi-Class Document Classification Using Lexical Ontology-Based Deep Learning” Appl. Sci. 2023, 13(10), 6139; https://doi.org/10.3390/app13106139
2. Kadhim, A.I. Survey on supervised machine learning techniques for automatic text classification. Artif. Intell. Rev. 2019, 52, 273–292. [Google Scholar] [CrossRef]
3. L. Deng and J. C. Platt, "Ensemble deep learning for speech recognition," in Fifteenth annual conference of the international speech communication association, 2014
4. W. Yin, K. Kann, M. Yu, and H. Schütze, "Comparative study of CNN and RNN for natural language processing," arXiv preprint arXiv:1702.01923, 2017.
5. Lai, L. Xu, K. Liu, and J. Zhao, "Recurrent convolutional neural networks for text classification," in Twenty-ninth AAAI conference on artificial intelligence, 2015.