Deep Learning Approaches for Big Data-Driven Metadata Extraction in Online Job Postings-Reference-Cited by-同舟云学术

Deep Learning Approaches for Big Data-Driven Metadata Extraction in Online Job Postings

Published:2023-10-25 Issue:11 Volume:14 Page:585
ISSN:2078-2489
Container-title:Information
language:en
Short-container-title:Information

Author:

Skondras Panagiotis¹,Zotos Nikos²,Lagios Dimitris¹,Zervas Panagiotis¹,Giotopoulos Konstantinos C.²^ORCID,Tzimas Giannis¹

Affiliation:

1. Data and Media Laboratory, Department of Electrical and Computer Engineering, University of Peloponnese, 22131 Tripolis, Greece

2. Department of Management Science and Technology, University of Patras, 26334 Patras, Greece

Abstract

This article presents a study on the multi-class classification of job postings using machine learning algorithms. With the growth of online job platforms, there has been an influx of labor market data. Machine learning, particularly NLP, is increasingly used to analyze and classify job postings. However, the effectiveness of these algorithms largely hinges on the quality and volume of the training data. In our study, we propose a multi-class classification methodology for job postings, drawing on AI models such as text-davinci-003 and the quantized versions of Falcon 7b (Falcon), Wizardlm 7B (Wizardlm), and Vicuna 7B (Vicuna) to generate synthetic datasets. These synthetic data are employed in two use-case scenarios: (a) exclusively as training datasets composed of synthetic job postings (situations where no real data is available) and (b) as an augmentation method to bolster underrepresented job title categories. To evaluate our proposed method, we relied on two well-established approaches: the feedforward neural network (FFNN) and the BERT model. Both the use cases and training methods were assessed against a genuine job posting dataset to gauge classification accuracy. Our experiments substantiated the benefits of using synthetic data to enhance job posting classification. In the first scenario, the models’ performance matched, and occasionally exceeded, that of the real data. In the second scenario, the augmented classes consistently outperformed in most instances. This research confirms that AI-generated datasets can enhance the efficacy of NLP algorithms, especially in the domain of multi-class classification job postings. While data augmentation can boost model generalization, its impact varies. It is especially beneficial for simpler models like FNN. BERT, due to its context-aware architecture, also benefits from augmentation but sees limited improvement. Selecting the right type and amount of augmentation is essential.

Publisher

MDPI AG

Subject

Information Systems

Link

https://www.mdpi.com/2078-2489/14/11/585/pdf

Reference45 articles.

1. (2023, October 15). OpenAI API. Available online: https://bit.ly/3UOELSX.

2. (2023, October 15). GPT4All API. Available online: https://docs.gpt4all.io/index.html.

3. Ye, J., Chen, X., Xu, N., Zu, C., Shao, Z., Liu, S., Cui, Y., Zhou, Z., Gong, C., and Shen, Y. (2023). A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models. arXiv.

4. Anand, Y., Nussbaum, Z., Duderstadt, B., Schmidt, B., and Mulyar, A. (2023, September 16). GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo. Available online: https://github.com/nomic-ai/gpt4all.

5. (2023, October 15). The Rise of Open-Source LLMs in 2023: A Game Changer in AI. Available online: https://www.ankursnewsletter.com/p/the-rise-of-open-source-llms-in-2023.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Automatic Extraction and Cluster Analysis of Natural Disaster Metadata Based on the Unified Metadata Framework;ISPRS International Journal of Geo-Information;2024-06-14