Mining Career Paths from Large Resume Databases

Author:

Lappas Theodoros1ORCID

Affiliation:

1. Stevens Institute of Technology, Hoboken, New Jersey

Abstract

The emergence of online professional platforms, such as LinkedIn and Indeed, has led to unprecedented volumes of rich resume data that have revolutionized the study of careers. One of the most prevalent problems in this space is the extraction of prototype career paths from a workforce. Previous research has consistently relied on a two-step approach to tackle this problem. The first step computes the pairwise distances between all the career sequences in the database. The second step uses the distance matrix to create clusters, with each cluster representing a different prototype path. As we demonstrate in this work, this approach faces two significant challenges when applied on large resume databases. First, the overwhelming diversity of job titles in the modern workforce prevents the accurate evaluation of distance between career sequences. Second, the clustering step of the standard approach leads to highly heterogeneous clusters, due to its inability to handle categorical sequences and sensitivity to outliers. This leads to non-representative centroids and spurious prototype paths that do not accurately represent the actual groups in the workforce. Our work addresses these two challenges and has practical implications for the numerous researchers and practitioners working on the analysis of career data across domains.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3