On the Validity of Using Webpage Texts to Identify the Target Population of a Survey: An Application to Detect Online Platforms

Author:

Daas Piet1,Hassink Wolter23,Klijs Bart4

Affiliation:

1. Mathematics and Computer Science Department, Eindhoven University of Technology, Eindhoven, The Netherlands

2. Utrecht University School of Economics, Utrecht University, Utrecht, The Netherlands

3. IZA, IInstitute of Labour Economics, Bonn, Germany

4. Sector Culture, Tourism and Technology, Statistics Netherlands, The Hague, The Netherlands

Abstract

A statistical classification model was developed to identify online platform organizations based on the texts on their website. The model was subsequently used to identify all (potential) platform organizations with a website included in the Dutch Business Register. The empirical outcomes of the statistical model were plausible in terms of the words and the bimodal distribution of fitted probabilities, but the results indicated an overestimation of the number of platform organizations. Next, the external validity of the outcomes was investigated through a survey of the organizations that were identified as a platform organization by the statistical classification model. The response by the organizations to the survey confirmed a substantial number of type-I errors. Furthermore, it revealed a positive association between the fitted probability of the text-based classification model and the organization’s response to the survey question on being an online platform organization. The survey results indicated that the text-based classification model can be used to obtain a subpopulation of potential platform organizations from the entire population of businesses with a website. This subpopulation may form a good starting point to study platform organizations in more detail.

Publisher

SAGE Publications

Reference35 articles.

1. Allen C., Hospedales T. 2019. “Analogies Explained: Towards Understanding Word Embeddings.” Proceedings of the 36th International Conference on Machine Learning: ICML, Long Beach, CA, USA, June 11–13. http://proceedings.mlr.press/v97/allen19a/allen19a.pdf (accessed November 2023).

2. Text Mining in Official Statistics

3. Berg J., Furrer M., Harmon E., Rani U., Silberman M. 2018. Digital Labour Platforms and the Future of Work. Towards Decent Work in the Online World. Geneva: International Labour Organization. https://www.ilo.org/wcmsp5/groups/public/—dgreports/—dcomm/—publ/documents/publication/wcms_645337.pdf (accessed November 2023).

4. Bergstra J., Bardenet R., Bengio Y., Kégl B. 2011. “Algorithms for Hyper-Parameter Optimization.” In Advances in Neural Information Processing Systems 24, edited by Shawe-Taylor J., Zemel R., Bartlett P., Pereira F., Weinberger K.Q. New York: Curran Associates, Inc. https://proceedings.neurips.cc/paper/2011/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf (accessed November 2023).

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3