IoT-Based Voice-Controlled Smart Homes with Source Separation Based on Deep Learning

Author:

Alshammri Ghalib H.1ORCID

Affiliation:

1. Department of Computer Science, Community College, King Saud University, Riyadh 11437, Saudi Arabia

Abstract

The widespread availability of cutting-edge computer technologies has shed light on the relevance of artificial intelligence (AI) applications in almost all sectors of the economy. As a result of the incorporation of voice control processing into many Internet of Things (IoT) devices, many of these IoT devices may be operated using spoken commands. The environment that is controlled by speech may include several devices, each of which may be used for a separate activity; yet, all of the devices may collect and process the same command at the same time. This may be the case if the devices can communicate with one another. Because other devices may choose to ignore orders that are intended for particular devices if those devices are not equipped to deal with those orders, only the device that is designed to carry out the activity and process the command will be able to carry out the activity. This is because only the device that is designed to carry out the activity and process the command will be able to carry out the activity. On the other hand, when all of the voice-controlled devices capture the command through the microphone, there is a greater chance that it will mix with other sounds coming from a variety of sources. This is because the microphone is being used to capture the command from all of the voice-controlled devices. These noises may include those that are emanating from the television, music systems, and other sounds that are created by activities taking on inside the family, among other things. During the identification of instructions via processing, any blending of other sounds that are not the primary command is regarded as noise and has to be deleted. This is because any such blending is deemed to be noise. The direction of arrival (also known as DOA) of the sound waves is given primary consideration by this approach. This is done at the same time as the performance of the system, and the proposal for it are being evaluated. Based on the angle of arrival estimate, a specific room impulse response (RIR) from a collection of defined RIR is identified as a room acoustic characteristic, and source separation is carried out using the technique of independent component analysis (ICA). Following the completion of the analysis of the signals produced by the split command speech, the characteristics of the speech are retrieved from the signals. The Mel-frequency cepstral coefficients (MFCC) approach is used so that the operation of feature extraction may be carried out. This is the goal of the technique. Following that, a support vector machine classifier is used to the data in order to further split these characteristics into a large range of distinct groups. Comparisons are made between the performance of the SVM classifier and the performance of a large number of different classifiers, including decision trees, which are often used in applications that incorporate machine learning (DT). After analyzing its performance, the multiclass SVM classifier is found to have an accuracy of 91%, according to the conclusions of the study. Utilizing a classifier that is based on a probabilistic neural network, which is sometimes referred to as a PNN, is one way in which the accuracy of future classifications may be enhanced. This particular classifier is made up of three layers: one layer of gated recurrent units (GRU), one layer of long short-term memory (LSTM), and one layer that integrates the two of those different kinds of memory. This classification seems to have obtained an accuracy of 94.5 percent, which is higher than the classification accuracy attained by the multiclass SVM classifier.

Publisher

Hindawi Limited

Subject

Electrical and Electronic Engineering,Instrumentation,Control and Systems Engineering

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Benchmarking MLCommons Tiny Audio Denoising with Deployability Constraints;2024 IEEE Gaming, Entertainment, and Media Conference (GEM);2024-06-05

2. Applications of AI-empowered electric vehicles for voice recognition in Asian and Austronesian languages;Artificial Intelligence-Empowered Modern Electric Vehicles in Smart Grid Systems;2024

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3