1. Ryan Prescott Adams and Richard S . Zemel . 2011 . Ranking via Sinkhorn Propagation . https://doi.org/10.48550/ARXIV.1106.1925 10.48550/ARXIV.1106.1925 Ryan Prescott Adams and Richard S. Zemel. 2011. Ranking via Sinkhorn Propagation. https://doi.org/10.48550/ARXIV.1106.1925
2. Mikel Artetxe , Shruti Bhosale , Naman Goyal , 2022 . Efficient Large Scale Language Modeling with Mixtures of Experts . In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. Mikel Artetxe, Shruti Bhosale, Naman Goyal, et al. 2022. Efficient Large Scale Language Modeling with Mixtures of Experts. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
3. Sparsity Constrained Nonlinear Optimization: Optimality Conditions and Algorithms
4. Emmanuel Bengio , Pierre-Luc Bacon , Joelle Pineau , and Doina Precup . 2016 . Conditional Computation in Neural Networks for faster models . In International Conference on Learning Representations Workshop Tract. https://openreview.net/forum?id=B1ckMDqlg Emmanuel Bengio, Pierre-Luc Bacon, Joelle Pineau, and Doina Precup. 2016. Conditional Computation in Neural Networks for faster models. In International Conference on Learning Representations Workshop Tract. https://openreview.net/forum?id=B1ckMDqlg
5. Yoshua Bengio , Nicholas Léonard , and Aaron C . Courville . 2013 . Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation. CoRR abs/1308.3432 (2013). arXiv:1308.3432 http://arxiv.org/abs/1308.3432 Yoshua Bengio, Nicholas Léonard, and Aaron C. Courville. 2013. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation. CoRR abs/1308.3432 (2013). arXiv:1308.3432 http://arxiv.org/abs/1308.3432