Affiliation:
1. University of California, Berkeley, CA
2. International Computer Science Institute and University of California, Berkeley, CA
Abstract
In this new era dominated by consumer-produced media there is a high demand for web-scalable solutions to multimedia content analysis. A compelling approach to making applications scalable is to explicitly map their computation onto parallel platforms. However, developing efficient parallel implementations and fully utilizing the available resources remains a challenge due to the increased code complexity, limited portability and required low-level knowledge of the underlying hardware. In this article, we present PyCASP, a Python-based framework that automatically maps computation onto parallel platforms from Python application code to a variety of parallel platforms. PyCASP is designed using a systematic, pattern-oriented approach to offer a single software development environment for multimedia content analysis applications. Using PyCASP, applications can be prototyped in a couple hundred lines of Python code and automatically scale to modern parallel processors. Applications written with PyCASP are portable to a variety of parallel platforms and efficiently scale from a single desktop Graphics Processing Unit (GPU) to an entire cluster with a small change to application code. To illustrate our approach, we present three multimedia content analysis applications that use our framework: a state-of-the-art speaker diarization application, a content-based music recommendation system based on the Million Song Dataset, and a video event detection system for consumer-produced videos. We show that across this wide range of applications, our approach achieves the goal of automatic portability and scalability while at the same time allowing easy prototyping in a high-level language and efficient performance of low-level optimized code.
Funder
U.C. Discovery (Award #DIG07-10227)
Microsoft (Award #024263)
Intel (Award #024894)
Samsung
Par Lab affiliates National Instruments
Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number D11PC20066
NVIDIA
Nokia
Oracle
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Networks and Communications,Hardware and Architecture
Reference60 articles.
1. CLAM
2. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions
3. Speaker Diarization: A Review of Recent Research
4. K. Asanovic R. Bodik etal 2006. The landscape of parallel computing research: A view from Berkeley. Tech. rep. UCB/EECS-2006-183 EECS Department University of California Berkeley. K. Asanovic R. Bodik et al. 2006. The landscape of parallel computing research: A view from Berkeley. Tech. rep. UCB/EECS-2006-183 EECS Department University of California Berkeley.
5. D. Ascher P. F. Dubois K. Hinsen J. Hugunin and T. Oliphant. 1999. Numerical Python UCRL-MA-128569. Lawrence Livermore National Laboratory Livermore CA. D. Ascher P. F. Dubois K. Hinsen J. Hugunin and T. Oliphant. 1999. Numerical Python UCRL-MA-128569. Lawrence Livermore National Laboratory Livermore CA.
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Application-Oriented Content Quality Analysis of Data Using Python;Lecture Notes in Networks and Systems;2022
2. Detecting Events in Streaming Multimedia with Big Data Techniques;2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP);2016-02
3. A distributed architecture to integrate ontological knowledge into information extraction;International Journal of Grid and Utility Computing;2016
4. Parallel Massive Clustering of Discrete Distributions;ACM Transactions on Multimedia Computing, Communications, and Applications;2015-06-02
5. A pattern oriented approach for designing scalable analytics applications (invited talk);Proceedings of the 2nd Workshop on Parallel Programming for Analytics Applications;2015-02-08