Abstract
AbstractAccessibility on edge devices and the trade-off between latency and accuracy is an area of interest in deploying deep learning models. This paper explores a Mixture of Experts system, namely, DynK-Hydra, which allows training of an ensemble formed of multiple similar branches on data sets with a high number of classes, but uses, during the inference, only a subset of necessary branches. We achieve this by training a cohort of specialized branches (deep network of reduced size) and a gater/supervisor, that decides dynamically what branch to use for any specific input. An original contribution is that the number of chosen models is dynamically set, based on how confident the gater is (similar works use a static parameter for this). Another contribution is the way we ensure the branches’ specialization. We divide the data set classes into multiple clusters, and we assign a cluster to each branch while enforcing its specialization on this cluster by a separate loss function. We evaluate DynK-Hydra on CIFAR-100, Food-101, CUB-200, and ImageNet32 data sets and we obtain improvements of up to 4.3% accuracy compared with state-of-the-art ResNet. All this while reducing the number of inference flops by a factor of 2–5.5 times. Compared to a similar work (HydraRes), we obtain marginal accuracy improvements of up to 1.2% on the pairwise inference time architectures. However, we improve the inference times by up to 2.8 times compared to HydraRes.
Publisher
Springer Science and Business Media LLC
Subject
Computational Mathematics,Engineering (miscellaneous),Information Systems,Artificial Intelligence
Reference31 articles.
1. Zhang Y, Xiang T, Hospedales TM, Lu H (2018) Deep mutual learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4320– 4328
2. Fort S, Hu H, Lakshminarayanan B (2019) Deep ensembles: A loss landscape perspective. arXiv preprint arXiv:1912.02757
3. Brown G, Wyatt J, Harris R, Yao X (2005) Diversity creation methods: a survey and categorisation. Inform Fusion 6(1):5–20
4. Masoudnia S, Ebrahimpour R (2014) Mixture of experts: a literature survey. Artif Intell Rev 42(2):275–293
5. Mullapudi RT, Mark WR, Shazeer N, Fatahalian K.( 2018) Hydranets: Specialized dynamic architectures for efficient inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8080– 8089