Abstract
Today, Artificial Intelligence is a daily routine, becoming deeply entrenched in
our lives. One of the most popular and rapidly advancing technologies is speech
recognition, which forms an integral part of the broader concept of multimodal data
handling. Multimodal data encompasses voice, audio, and text data, constituting a
multifaceted approach to understanding and processing information. This paper presents
the development of a multimodal handling interface leveraging Google API technologies.
The interface aims to facilitate seamless integration and management of diverse data
modalities, including text, audio, and video, within a unified platform. Through the
utilization of Google API functionalities, such as natural language processing, speech
recognition, and video analysis, the interface offers enhanced capabilities for
processing, analysing, and interpreting multimodal data. The paper discusses the design
and implementation of the interface, highlighting its features and functionalities.
Furthermore, it explores potential applications and future directions for utilizing the
interface in various domains, including healthcare, education, and multimedia content
creation. Overall, the development of the multimodal handling interface based on Google
API represents a significant step towards advancing multimodal data processing and
enhancing user experience in interacting with diverse data sources.
Publisher
Lviv Polytechnic National University