Author:
Satya Mohan Chowdary G ,T Ganga Bhavani ,D Konda Babu ,B Prasanna Rani ,K Sireesha
Abstract
For language tasks like text classification and sequence labeling, word embeddings are essential for providing input characteristics in deep models. There have been many word embedding techniques put out in the past ten years, which can be broadly divided into classic and context-based embeddings. In this study, two encoders—CNN and BiLSTM—are used in a downstream network architecture to analyze both forms of embeddings in the context of text classification. Four benchmarking classification datasets with single-label and multi-label tasks and a range of average sample lengths are selected in order to evaluate the effects of word embeddings on various datasets. CNN routinely beats BiLSTM, especially on datasets that don't take document context into account, according to the evaluation results with confidence intervals. CNN is therefore advised above BiLSTM for datasets involving document categorization where context is less predictive of class membership. Concatenating numerous classic embeddings or growing their size for word embeddings doesn't greatly increase performance, while there are few instances when there are marginal gains. Contrarily, context-based embeddings like ELMo and BERT are investigated, with BERT showing better overall performance, particularly for longer document datasets. On short datasets, both context-based embeddings perform better, but on longer datasets, no significant improvement is seen.In conclusion, this study emphasizes the significance of word embeddings and their impact on downstream tasks, highlighting the advantages of BERT over ELMo, especially for lengthier documents, and CNN over BiLSTM for certain scenarios involving document classification.
Reference52 articles.
1. Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. “Scoring, term weighting and the vector space model.” Introduction to information retrieval 100 (2008): 2-4.
2. Mikolov, Tomáš, Wen-tau Yih, and Geoffrey Zweig. “Linguistic regularities in continuous space word representations.” Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: Human language technologies. 2013.
3. Santos, Cicero D., and Bianca Zadrozny. “Learning character-level representations for part-of- speech tagging.” Proceedings of the 31st international conference on machine learning (ICML- 14). 2014.
4. dos Santos, Cıcero, et al. “Boosting Named Entity Recognition with Neural Character Embeddings.” Proceedings of NEWS 2015 The Fifth Named Entities Workshop. 2015.
5. Mikolov, Tomas, et al. “Efficient Estimation of Word Representations in Vector Space.” ICLR (Workshop Poster). 2013.