Augmenting interpretable models with large language models during training-Reference-Cited by-同舟云学术

Augmenting interpretable models with large language models during training

Published:2023-11-30 Issue:1 Volume:14 Page:
ISSN:2041-1723
Container-title:Nature Communications
language:en
Short-container-title:Nat Commun

Author:

Singh Chandan^ORCID,Askari Armin,Caruana Rich,Gao Jianfeng

Abstract

AbstractRecent large language models (LLMs), such as ChatGPT, have demonstrated remarkable prediction performance for a growing array of tasks. However, their proliferation into high-stakes domains and compute-limited settings has created a burgeoning need for interpretability and efficiency. We address this need by proposing Aug-imodels, a framework for leveraging the knowledge learned by LLMs to build extremely efficient and interpretable prediction models. Aug-imodels use LLMs during fitting but not during inference, allowing complete transparency and often a speed/memory improvement of greater than 1000x for inference compared to LLMs. We explore two instantiations of Aug-imodels in natural-language processing: Aug-Linear, which augments a linear model with decoupled embeddings from an LLM and Aug-Tree, which augments a decision tree with LLM feature expansions. Across a variety of text-classification datasets, both outperform their non-augmented, interpretable counterparts. Aug-Linear can even outperform much larger models, e.g. a 6-billion parameter GPT-J model, despite having 10,000x fewer parameters and being fully transparent. We further explore Aug-imodels in a natural-language fMRI study, where they generate interesting interpretations from scientific data.

Publisher

Springer Science and Business Media LLC

Subject

General Physics and Astronomy,General Biochemistry, Genetics and Molecular Biology,General Chemistry,Multidisciplinary

Link

https://www.nature.com/articles/s41467-023-43713-1.pdf

Reference100 articles.

1. Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).

2. Bubeck, S. et al. Sparks of artificial general intelligence: early experiments with GPT-4. https://arxiv.org/abs/2303.12712 (2023).

3. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. https://arxiv.org/abs/1810.04805 (2018).

4. Angermueller, C., Pärnamaa, T., Parts, L. & Stegle, O. Deep learning for computational biology. Mol. Syst. Biol. 12, 878 (2016).

5. Kornblith, A. E. et al. Predictability and stability testing to assess clinical decision instrument performance for children after blunt torso trauma. PLOS Digit. Health https://doi.org/10.1371/journal.pdig.0000076 (2022).

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. "The Winding Journey of Human-Machine Symbiosis": Nurse Researchers' Experiences and Perceptions of Generative Artificial Intelligence: Qualitative Study (Preprint);2024-08-28

2. From Data to Insight: Transforming Online Job Postings into Labor-Market Intelligence;Information;2024-08-20

3. Large language models for medicine: a survey;International Journal of Machine Learning and Cybernetics;2024-08-19

4. Interpretable deep learning in single-cell omics;Bioinformatics;2024-06

5. Research on the Application Methods of Large Language Model Interpretability in FinTech Scenarios;2024 4th International Conference on Computer Communication and Artificial Intelligence (CCAI);2024-05-24