Development of an Internet-based Product-related Child Injury Textual Data Platform (IPCITDP) in China (Preprint)

Author:

Xiao WangxinORCID,Cheng PeixiaORCID,Schwebel David C.ORCID,Yang LeiORCID,Zhao MinORCID,Zhao ShuyingORCID,Hu GuoqingORCID

Abstract

BACKGROUND

Internet-based media stories provide valuable information for emerging risks of product-related child injury prevention and control, but critical methodological challenges and high costs of data acquisition and processing restrict practical use by stakeholders.

OBJECTIVE

To develop an automated data platform for gathering, processing, and transforming textual media stories into structured data that can support identification of new product-related child injury risks, development of research priorities, and modification of prevention policy and practice.

METHODS

The data platform was constructed through literature reviews and multi-round research group discussions. Components developed included standard search strategies, filtering criteria, textual document classification, information extraction standards and a keyword dictionary. Ten thousand manually labelled media stories were used to validate the textual document classification model, which was established using the Bidirectional Encoder Representation from Transformers (BERT). Multiple information extraction methods, all based on natural language processing algorithms, were adopted to extract data for 29 structured variables from media stories. They were evaluated through manually validation of 1,000 media stories about product-related child injury. We mapped the geographic distribution of media sources and media-reported product-related child injury events.

RESULTS

We developed an internet-based product-related child injury textual data platform, IPCITDP, that automatically collects, stores, and processes online media stories concerning product-related child injury in China every day. The IPCITDP is composed of four layers -- data search and acquisition, data processing, data storage, and data application. External validation showed high performance for the BERT textual document classification model we established (accuracy = 0.9703) and the combined information extraction strategies (accuracy > 0.70 for 25 variables). As of December 31, 2022, the IPCITDP collected 28,979 eligible product-related child injury reports from 9,935 news media websites or social media platform accounts which were geographically located in all 31 provinces of mainland China and covered over 97% of the prefecture-level cities. The product-related child injury cases collected by the IPCITDP were typically reported several months or years earlier than official announcements about the product-related child injury risks. The IPCITDP added data concerning 15 supplementary variables that are not covered by the national product-related injury surveillance system. Two examples demonstrate the value of IPCITDP in supplementing additional data and providing early detection of emerging epidemiological signals concerning product-related child injury, one for magnetic beads related child injury and the other for electric self-balancing scooters related child injury.

CONCLUSIONS

The IPCITDP provides product-related child injury data that can support early detection of new product-related child injury characteristics in China and supplement existing data sources to reduce the burden of product-related injury among Chinese children.

Publisher

JMIR Publications Inc.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3