Affiliation:
1. National Taiwan University
Abstract
Recent advances in linear classification have shown that for applications such as document classification, the training process can be extremely efficient. However, most of the existing training methods are designed by assuming that data can be stored in the computer memory. These methods cannot be easily applied to data larger than the memory capacity due to the random access to the disk. We propose and analyze a block minimization framework for data larger than the memory size. At each step a block of data is loaded from the disk and handled by certain learning methods. We investigate two implementations of the proposed framework for primal and dual SVMs, respectively. Because data cannot fit in memory, many design considerations are very different from those for traditional algorithms. We discuss and compare with existing approaches that are able to handle data larger than memory. Experiments using data sets 20 times larger than the memory demonstrate the effectiveness of the proposed method.
Funder
National Science Council Taiwan
Publisher
Association for Computing Machinery (ACM)
Reference34 articles.
1. Bertsekas D. P. 1999. Nonlinear Programming 2nd Ed. Athena Scientific Belmont MA. Bertsekas D. P. 1999. Nonlinear Programming 2nd Ed. Athena Scientific Belmont MA.
2. Bottou L. 2007. Stochastic gradient descent examples. http://leon.bottou.org/projects/sgd. Bottou L. 2007. Stochastic gradient descent examples. http://leon.bottou.org/projects/sgd.
3. Boyd S. and Vandenberghe L. 2004. Convex Optimization. Cambridge University Press. Boyd S. and Vandenberghe L. 2004. Convex Optimization . Cambridge University Press.
4. Contextual advertising by combining relevance with click feedback
Cited by
36 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献