Semi-Supervised Lexicon-Aware Embedding for News Article Time Estimation

Author:

Ahmed Usman1,Lin Jerry Chun-Wei1,Srivastava Gautam2,Yun Unil3

Affiliation:

1. Western Norway University of Applied Sciences, Norway

2. Brandon University, Canada

3. Sejong University, South Korea

Abstract

In the information retrieval community, Temporal Information Retrieval (TIR) has become increasingly popular. Documents focused on the time surrounding their publication are more likely to be accurate and contain information relevant to the reader. In this study, we explore the inverted pyramid paradigm by extracting temporal expressions from news documents, standardizing their values, and evaluating them based on their position within the text. We present a lexicon expansion method that employs WordNet as input. This approach enhances the lexicon by grouping words with similar meanings, potentially improving the accuracy of event detection algorithms. Additionally, this process can introduce new words and phrases to the lexicon, expanding the vocabulary. Using each tagged dataset, a classifier is trained with a pre-trained network. A pool of unlabeled data are processed, and high-confidence pseudo-labels are assigned. Pseudo-labels are generated by leveraging the partially trained model and the original labelled data. As the classifier predicts the correct label for a data sample, the pseudo-labels of other data samples are updated, and vice versa. At the end of this process, the predictions from different matching classifiers are combined. It takes several rounds to label the unlabeled inputs using this method. To evaluate the proposed solutions, we conducted experiments on 4,500 online news articles relevant to temporal retrieval. LSTM, BiLSTM, and BERT models with and without lexicon expansion were assessed based on log loss and relative divergence of entropy. A jointly trained semi-supervised learning model achieved a mean KL divergence of 0.89, an F1 score of 0.74 for temporal events, and 0.63 for non-temporal events. Besides alleviating data sparsity issues and enabling the training of more complex networks, this technique can also serve as an alternative to data augmentation methods.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference31 articles.

1. Usman Ahmed , Jerry Chun-Wei Lin , and Gautam Srivastava . 2021 . Fuzzy Explainable Attention-based Deep Active Learning on Mental-Health Data. In IEEE International Conference on Fuzzy Systems. IEEE, 1–6. Usman Ahmed, Jerry Chun-Wei Lin, and Gautam Srivastava. 2021. Fuzzy Explainable Attention-based Deep Active Learning on Mental-Health Data. In IEEE International Conference on Fuzzy Systems. IEEE, 1–6.

2. Usman Ahmed , Jerry Chun-Wei Lin*, and Gautam Srivastava . 2022 . Fuzzy Contrast Set Based Deep Attention Network for Lexical Analysis and Mental Health Treatment. Transactions on Asian and Low-Resource Language Information Processing ( 2022). Usman Ahmed, Jerry Chun-Wei Lin*, and Gautam Srivastava. 2022. Fuzzy Contrast Set Based Deep Attention Network for Lexical Analysis and Mental Health Treatment. Transactions on Asian and Low-Resource Language Information Processing (2022).

3. Usman Ahmed , Jerry Chun-Wei Lin, and Gautam Srivastava . 2022 . Social Media Multiaspect Detection by Using Unsupervised Deep Active Attention. IEEE Transactions on Computational Social Systems ( 2022). Usman Ahmed, Jerry Chun-Wei Lin, and Gautam Srivastava. 2022. Social Media Multiaspect Detection by Using Unsupervised Deep Active Attention. IEEE Transactions on Computational Social Systems (2022).

4. Attention-Based Deep Entropy Active Learning Using Lexical Algorithm for Mental Health Treatment

5. Satya Almasian , Dennis Aumiller , and Michael Gertz . 2021. BERT got a Date: Introducing Transformers to Temporal Tagging. ArXiv preprint abs/2109.14927 ( 2021 ). Satya Almasian, Dennis Aumiller, and Michael Gertz. 2021. BERT got a Date: Introducing Transformers to Temporal Tagging. ArXiv preprint abs/2109.14927 (2021).

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3