1. Mehmood T, Gerevini AE, Lavelli A, Olivato M, Serina I (2023) Distilling knowledge with a teacher´s multitask model for biomedical named entity recognition. Information 14(5)
2. Mehmood T, Serina I, Lavelli A, Gerevini A (2020) Knowledge distillation techniques for biomedical named entity recognition. In: Proceedings of the 4th workshop on natural language for artificial intelligence (NL4AI 2020) co-located with the 19th International conference of the Italian Association for artificial intelligence (AI*IA 2020), Anywhere, November 25–27th, 2020. CEUR Workshop Proceedings, vol 2735, pp 141–156. CEUR-WS.org
3. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
4. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
5. Kenton JDMWC, Toutanova LK (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, vol 1, p 2