Author:
Li Jiaming,Lang Lang,Zhu Zhenlong,Wang Haozhao,Li Ruixuan,Xu Wenchao
Publisher
Springer Nature Switzerland
Reference25 articles.
1. Birgin, E., Martínez, J.: Block coordinate descent for smooth nonconvex constrained minimization. Comput. Optim. Appl. 83(1), 1–27 (2022)
2. Cheng, W., Shen, Y., Huang, L.: Adaptive factorization network: learning adaptive-order feature interactions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 3609–3616 (2020)
3. Du, N., et al.: GLaM: efficient scaling of language models with mixture-of-experts. In: International Conference on Machine Learning, pp. 5547–5569. PMLR (2022)
4. Guo, H., TANG, R., Ye, Y., Li, Z., He, X.: DeepFM: a factorization-machine based neural network for CTR prediction. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp. 1725–1731 (2017). https://doi.org/10.24963/ijcai.2017/239
5. Han, K., Xiao, A., Wu, E., Guo, J., Xu, C., Wang, Y.: Transformer in transformer. Adv. Neural. Inf. Process. Syst. 34, 15908–15919 (2021)