Variable Scale Pruning for Transformer Model Compression in End-to-End Speech Recognition

Author:

Ben Letaifa Leila12ORCID,Rouas Jean-Luc2ORCID

Affiliation:

1. LINEACT, UR-EA 7527, CESI Nancy, 54500 Vandœuvre-lès-Nancy, France

2. LaBRI, CNRS UMR 5800, University of Bordeaux, Bordeaux INP, 33405 Talence, France

Abstract

Transformer models are being increasingly used in end-to-end speech recognition systems for their performance. However, their substantial size poses challenges for deploying them in real-world applications. These models heavily rely on attention and feedforward layers, with the latter containing a vast number of parameters that significantly contribute to the model’s memory footprint. Consequently, it becomes pertinent to consider pruning these layers to reduce the model’s size. In this article, our primary focus is on the feedforward layers. We conduct a comprehensive analysis of their parameter count and distribution. Specifically, we examine the weight distribution within each layer and observe how the weight values progress across the transformer model’s blocks. Our findings demonstrate a correlation between the depth of the feedforward layers and the magnitude of their weights. Consequently, layers with higher weight values require less pruning. Building upon this insight, we propose a novel pruning algorithm based on variable rates. This approach sets the pruning rate according to the significance and location of each feedforward layer within the network. To evaluate our new pruning method, we conduct experiments on various datasets. The results reveal its superiority over conventional pruning techniques, such as local pruning and global pruning.

Funder

European Union’s Horizon 2020 Research and Innovation action

Publisher

MDPI AG

Subject

Computational Mathematics,Computational Theory and Mathematics,Numerical Analysis,Theoretical Computer Science

Reference45 articles.

1. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017, January 4–9). Attention Is All You Need. Proceedings of the Advances in NIPS, 2017, Long Beach, CA, USA.

2. Han, S., Pool, J., Tran, J., and Dally, W.J. (2015, January 7–12). Learning both Weights and Connections for Efficient Neural Networks. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, USA.

3. Perceptual Borderline for Balancing Multi-Class Spontaneous Emotional Data;Letaifa;IEEE Access,2021

4. Watanabe, S., Hori, T., Karita, S., Hayashi, T., Nishitoba, J., Unno, Y., Soplin, N.E.Y., Heymann, J., Wiesner, M., and Chen, N. (2018, January 2–6). Espnet: End-to-End Speech Processing Toolkit. Proceedings of the INTERSPEECH, Hyderabad, India.

5. Rabiner, L., and Juang, B.H. (1993). Fundamentals of Speech Recognition, Prentice Hall.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3