Affiliation:
1. Stevens Institute of Technology, Hoboken, New Jersey
Abstract
The emergence of online professional platforms, such as LinkedIn and Indeed, has led to unprecedented volumes of rich resume data that have revolutionized the study of careers. One of the most prevalent problems in this space is the extraction of prototype career paths from a workforce. Previous research has consistently relied on a two-step approach to tackle this problem. The first step computes the pairwise distances between all the career sequences in the database. The second step uses the distance matrix to create clusters, with each cluster representing a different prototype path. As we demonstrate in this work, this approach faces two significant challenges when applied on large resume databases. First, the overwhelming diversity of job titles in the modern workforce prevents the accurate evaluation of distance between career sequences. Second, the clustering step of the standard approach leads to highly heterogeneous clusters, due to its inability to handle categorical sequences and sensitivity to outliers. This leads to non-representative centroids and spurious prototype paths that do not accurately represent the actual groups in the workforce. Our work addresses these two challenges and has practical implications for the numerous researchers and practitioners working on the analysis of career data across domains.
Publisher
Association for Computing Machinery (ACM)
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献