1. Balasubramanian K, Lebanon G (2012) The landmark selection method for multiple output prediction. arXiv preprint arXiv:1206.6479
2. Bhatia K, Jain H, Kar P, Varma M, Jain P (2015) Sparse local embeddings for extreme multi-label classification. In: Advances in neural information processing systems, pp 730–738
3. Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078
4. Cisse MM, Usunier N, Artieres T, Gallinari P (2013) Robust bloom filters for large multilabel classification tasks. In: Advances in Neural Information Processing Systems, pp 1851–1859
5. Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805