A Privacy-Preserving Multilingual Comparable Corpus Construction Method in Internet of Things
-
Published:2024-02-17
Issue:4
Volume:12
Page:598
-
ISSN:2227-7390
-
Container-title:Mathematics
-
language:en
-
Short-container-title:Mathematics
Author:
Weng Yu1, Dong Shumin2, Chaomurilige Chaomurilige1
Affiliation:
1. Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China, Beijing 100081, China 2. School of Chinese Ethnic Minority Languages and Literatures, Minzu University of China, Beijing 100081, China
Abstract
With the expansion of the Internet of Things (IoT) and artificial intelligence (AI) technologies, multilingual scenarios are gradually increasing, and applications based on multilingual resources are also on the rise. In this process, apart from the need for the construction of multilingual resources, privacy protection issues like data privacy leakage are increasingly highlighted. Comparable corpus is important in multilingual language information processing in IoT. However, the multilingual comparable corpus concerning privacy preserving is rare, so there is an urgent need to construct a multilingual corpus resource. This paper proposes a method for constructing a privacy-preserving multilingual comparable corpus, taking Chinese–Uighur–Tibetan IoT based news as an example, and mapping the different language texts to a unified language vector space to avoid sensitive information, then calculates the similarity between different language texts and serves as a comparability index to construct comparable relations. Through the decision-making mechanism of minimizing the impossibility, it can identify a comparable corpus pair of multilingual texts based on chapter size to realize the construction of a privacy-preserving Chinese–Uighur–Tibetan comparable corpus (CUTCC). Evaluation experiments demonstrate the effectiveness of our proposed provable method, which outperforms in accuracy rate by 77%, recall rate by 34% and F value by 47.17%. The CUTCC provides valuable privacy-preserving data resources support and language service for multilingual situations in IoT.
Funder
the National Key Research and Development Program of China
Reference50 articles.
1. Rock, L.Y., Tajudeen, F.P., and Chung, Y.W. (2022). Usage and impact of the internet-of-things-based smart home technology: A quality-of-life perspective. Univers. Access Inf. Soc., 1–20. 2. Bin, G., Sicong, L., Yan, L., Zhigang, L., Zhiwen, Y., and Xingshe, Z. (2023). AIoT: The Concept, Architecture, and Key Techniques. Chin. J. Comput., 46, Available online: https://kns.cnki.net/kcms2/article/abstract?v=rCMvAF-4El1WLvIjsXZvAiChQ0k3XL_bsnLH7YPUPymadeQl07Yn4l2QCxVCT00_44fCKwOqV3BqfGYLToQHOBA5_7c8GU109AwCbRghrzgOcLqM8RjBiYu-a3zDXmea9Atwq5h28dVtTYsbmZu0sQ==&uniplatform=NZKPT&language=CHS. 3. O’Shaughnessy, P., and Lin, Y.X. (2022). Privacy Protection Practice for Data Mining with Multiple Data Sources: An Example with Data Clustering. Mathematics, 10. 4. Aljumah, A., and Ahanger, T. (2023). Blockchain-Based Information Sharing Security for the Internet of Things. Mathematics, 11. 5. Liang, K., Zhou, B., Zhang, Y., He, Y., Guo, X., and Zhang, B. (2022). A Multi-Entity Knowledge Joint Extraction Method of Communication Equipment Faults for Industrial IoT. Electronics, 11.
|
|