COMET: Learning Cardinality Constrained Mixture of Experts with Trees and Local Search-Reference-Cited by-同舟云学术

COMET: Learning Cardinality Constrained Mixture of Experts with Trees and Local Search

Published:2023-08-04 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
language:
Short-container-title:

Author:

Ibrahim Shibal¹^ORCID,Chen Wenyu¹^ORCID,Hazimeh Hussein²^ORCID,Ponomareva Natalia²^ORCID,Zhao Zhe³^ORCID,Mazumder Rahul¹^ORCID

Affiliation:

1. Massachusetts Institute of Technology, Cambridge, MA, USA

2. Google Research, New York, NY, USA

3. Google DeepMind, Mountain View, CA, USA

Funder

Google

Office of Naval Research

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3580305.3599278

Reference61 articles.

1. Ryan Prescott Adams and Richard S . Zemel . 2011 . Ranking via Sinkhorn Propagation . https://doi.org/10.48550/ARXIV.1106.1925 10.48550/ARXIV.1106.1925 Ryan Prescott Adams and Richard S. Zemel. 2011. Ranking via Sinkhorn Propagation. https://doi.org/10.48550/ARXIV.1106.1925

2. Mikel Artetxe , Shruti Bhosale , Naman Goyal , 2022 . Efficient Large Scale Language Modeling with Mixtures of Experts . In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. Mikel Artetxe, Shruti Bhosale, Naman Goyal, et al. 2022. Efficient Large Scale Language Modeling with Mixtures of Experts. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.

3. Sparsity Constrained Nonlinear Optimization: Optimality Conditions and Algorithms

4. Emmanuel Bengio , Pierre-Luc Bacon , Joelle Pineau , and Doina Precup . 2016 . Conditional Computation in Neural Networks for faster models . In International Conference on Learning Representations Workshop Tract. https://openreview.net/forum?id=B1ckMDqlg Emmanuel Bengio, Pierre-Luc Bacon, Joelle Pineau, and Doina Precup. 2016. Conditional Computation in Neural Networks for faster models. In International Conference on Learning Representations Workshop Tract. https://openreview.net/forum?id=B1ckMDqlg

5. Yoshua Bengio , Nicholas Léonard , and Aaron C . Courville . 2013 . Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation. CoRR abs/1308.3432 (2013). arXiv:1308.3432 http://arxiv.org/abs/1308.3432 Yoshua Bengio, Nicholas Léonard, and Aaron C. Courville. 2013. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation. CoRR abs/1308.3432 (2013). arXiv:1308.3432 http://arxiv.org/abs/1308.3432