1. Baroni, M., Bernardini, S., 2004. Bootcat: Bootstrapping corpora and terms from the web., in: LREC, p. 1313.
2. Baroni, M., Kilgarriff, A., 2006. Large linguistically-processed web corpora for multiple languages, in: Demonstrations. URL: https://www.aclweb.org/anthology/E06-2001.
3. Clark, J., DeRose, S., et al., 1999. Xml path language (xpath) version 1.0.
4. Devlin, J., Chang, M.W., Lee, K., Toutanova, K., 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. ArXiv preprint arXiv:1810.04805.
5. Evert, S., Kilgarriff, A., Sharoff, S. (Eds.), 2008. Victor: the Web-Page Cleaning Tool.