Affiliation:
1. Rakuten Asia Pte Ltd
2. National University of Singapore, Singapore
Abstract
E-commerce platforms categorize their products into a multi-level taxonomy tree with thousands of leaf categories. Conventional methods for product categorization are typically based on machine learning
classification
algorithms. These algorithms take product information as input (e.g., titles and descriptions) to classify a product into a leaf category. In this article, we propose a new paradigm based on
machine translation
. In our approach, we translate a product’s natural language description into a sequence of tokens representing a root-to-leaf path in a product taxonomy. In our experiments on two large real-world datasets, we show that our approach achieves better predictive accuracy than a state-of-the-art classification system for product categorization. In addition, we demonstrate that our machine translation models can propose meaningful new paths between previously unconnected nodes in a taxonomy tree, thereby transforming the taxonomy into a directed acyclic graph. We discuss how the resultant taxonomy directed acyclic graph promotes user-friendly navigation, and how it is more adaptable to new products.
Publisher
Association for Computing Machinery (ACM)
Subject
General Computer Science,Management Information Systems
Reference36 articles.
1. Chris Anderson. 2006. The Long Tail. Hyperion. Chris Anderson. 2006. The Long Tail. Hyperion.
2. Latent Dirichlet allocation;Blei David M.;Journal of Machine Learning Research 3,2003
3. Cost-sensitive learning for large-scale hierarchical classification
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献