Affiliation:
1. Rice University, Houston, TX
2. Florida State University, Tallahassee, FL
Abstract
As data analytics has become an important application for modern data management systems, a new category of data management system has appeared recently: the scalable linear algebra system. We argue that a parallel or distributed database system is actually an excellent platform upon which to build such functionality. Most relational systems already have support for cost-based optimization---which is vital to scaling linear algebra computations---and it is well known how to make relational systems scalable.
We show that by making just a few changes to a parallel/distributed relational database system, such a system can become a competitive platform for scalable linear algebra. Taken together, our results should at least raise the possibility that brand new systems designed from the ground up to support scalable linear algebra are not absolutely necessary, and that such systems could instead be built on top of existing relational technology.
Publisher
Association for Computing Machinery (ACM)
Reference21 articles.
1. Apache spark mllib: http://spark.apache.org/docs/latest/mllib-data-types.html. Apache spark mllib: http://spark.apache.org/docs/latest/mllib-data-types.html.
2. Oracle corporation: https://docs.oracle.com/cd/B1930-6_01/index.htm. Oracle corporation: https://docs.oracle.com/cd/B1930-6_01/index.htm.
3. Spark SQL
4. The multidimensional database system RasDaMan
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献