1. ap Dyfrig, R. (2013). Hanes y we gymraeg. http://www.tiki-toki.com/timeline/entry/84932/Hanes-y-We-Gymraeg/Online publication.
2. EMILLE, A 67-million word corpus of indic languages: data collection, mark-up and harmonisation;Baker,2002
3. Berger, K.C., Hernaiz, A.G., Baroni, P., Hicks, D., Kruse, E., Quochi, V., Russo, I., Salonen, T. Sarhimaa, A. and Soria, C. (2018). The DLDP digital language survival kit. The Digital Language Diversity Project, www.dldp.eu.
4. A study of semantic integration across archaeological data and reports in different languages;Binding;J. Inf. Sci.,2018
5. Twitie: an open-source information extraction pipeline for microblog text;Bontcheva,2013