End-Users Know Best: Identifying Undesired Behavior of Alexa Skills Through User Review Analysis

Author:

Aldeen Mohammed1ORCID,Young Jeffrey1ORCID,Liao Song1ORCID,Chang Tsu-Yao1ORCID,Cheng Long1ORCID,Cai Haipeng2ORCID,Luo Xiapu3ORCID,Hu Hongxin4ORCID

Affiliation:

1. Clemson University, USA

2. Washington State University, USA

3. Hong Kong Polytechnic University, Hong Kong

4. University at Buffalo, USA

Abstract

The Amazon Alexa marketplace has grown rapidly in recent years due to third-party developers creating large amounts of content and publishing directly to a skills store. Despite the growth of the Amazon Alexa skills store, there have been several reported security and usability concerns, which may not be identified during the vetting phase. However, user reviews can offer valuable insights into the security & privacy, quality, and usability of the skills. To better understand the effects of these problematic skills on end-users, we introduce ReviewTracker, a tool capable of discerning and classifying semantically negative user reviews to identify likely malicious, policy violating, or malfunctioning behavior on Alexa skills. ReviewTracker employs a pre-trained FastText classifier to identify different undesired skill behaviors. We collected over 700,000 user reviews spanning 6 years with more than 200,000 negative sentiment reviews. ReviewTracker was able to identify 17,820 reviews reporting violations related to Alexa policy requirements across 2,813 skills, and 131,855 reviews highlighting different types of user frustrations associated with 9,294 skills. In addition, we developed a dynamic skill testing framework using ChatGPT to conduct two distinct types of tests on Alexa skills: one using a software-based simulation for interaction to explore the actual behaviors of skills and another through actual voice commands to understand the potential factors causing discrepancies between intended skill functionalities and user experiences. Based on the number of the undesired skill behavior reviews, we tested the top identified problematic skills and detected more than 228 skills violating at least one policy requirement. Our results demonstrate that user reviews could serve as a valuable means to identify undesired skill behaviors.

Funder

National Science Foundation

Publisher

Association for Computing Machinery (ACM)

Reference65 articles.

1. Accountability Act. 1996. Health insurance portability and accountability act of 1996. Public law 104 (1996), 191.

2. LOVO AI. Year. Getting Started with Genny API. https://lovo.ai/post/getting-started-with-genny-api [Accessed: 23-Jan-2024].

3. Mohammed Aldeen, Joshua Luo, Ashley Lian, Venus Zheng, Allen Hong, Preethika Yetukuri, and Long Cheng. 2023. ChatGPT vs. Human Annotators: A Comprehensive Analysis of ChatGPT for Text Annotation. In 2023 International Conference on Machine Learning and Applications (ICMLA). IEEE, 602--609.

4. Alexa Simulator. [n.d.]. https://developer.amazon.com/en-US/docs/alexa/devconsole/alexa-simulator.html. [Accessed: 30-Jan-2024].

5. Alexa Skills Privacy Requirements. [n. d.]. https://developer.amazon.com/fr-FR/docs/alexa/custom-skills/policy-requirements-for-an-alexa-skill.html. [Accessed: 22-May-2023].

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3