Towards efficient AutoML: a pipeline synthesis approach leveraging pre-trained transformers for multimodal data
-
Published:2024-07-19
Issue:9
Volume:113
Page:7011-7053
-
ISSN:0885-6125
-
Container-title:Machine Learning
-
language:en
-
Short-container-title:Mach Learn
Author:
Moharil Ambarish,Vanschoren Joaquin,Singh Prabhant,Tamburri Damian
Abstract
AbstractThis paper introduces an Automated Machine Learning (AutoML) framework specifically designed to efficiently synthesize end-to-end multimodal machine learning pipelines. Traditional reliance on the computationally demanding Neural Architecture Search is minimized through the strategic integration of pre-trained transformer models. This innovative approach enables the effective unification of diverse data modalities into high-dimensional embeddings, streamlining the pipeline development process. We leverage an advanced Bayesian Optimization strategy, informed by meta-learning, to facilitate the warm-starting of the pipeline synthesis, thereby enhancing computational efficiency. Our methodology demonstrates its potential to create advanced and custom multimodal pipelines within limited computational resources. Extensive testing across 23 varied multimodal datasets indicates the promise and utility of our framework in diverse scenarios. The results contribute to the ongoing efforts in the AutoML field, suggesting new possibilities for efficiently handling complex multimodal data. This research represents a step towards developing more efficient and versatile tools in multimodal machine learning pipeline development, acknowledging the collaborative and ever-evolving nature of this field.
Publisher
Springer Science and Business Media LLC
Reference38 articles.
1. Agrawal, A., Lu, J., Antol, S., Mitchell, M., Zitnick, C. L., Batra, D., & Parikh, D. (2015). VQA: Visual question answering. ArXiv. https://doi.org/10.48550/ARXIV.1505.00468 2. Baevski, A., Hsu, W.-N., Xu, Q., Babu, A., Gu, J., & Auli, M. (2022). data2vec: A General framework for self-supervised learning in speech, vision and language. arXivhttps://doi.org/10.48550/ARXIV.2202.03555. https://arxiv.org/abs/2202.03555 3. Barrett, L. F. (2017). How emotions are made: The secret life of the brain. Houghton Mifflin Harcourt. https://books.google.nl/books?id=hN8MBgAAQBAJ 4. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv. https://doi.org/10.48550/ARXIV.1810.04805 5. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020). An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv. https://doi.org/10.48550/ARXIV.2010.11929
|
|