Publisher
Springer Nature Singapore
Reference16 articles.
1. Schäfer R, and Bildhauer F (2013) Web corpus construction. In synthesis lectures on human language technologies 6(4):1–145.
2. Barbaresi A (2014) Finding viable seed URLs for web corpora: a scouting approach and comparative study of available sources. In 9th Web as Corpus Workshop (WaC-9), 14th conference of the european chapter of the association for computational linguistics, pp. 1–8.
3. Kilgarriff A, Baisa V, Bušta J, Jakubíček M, Kovář V, Michelfeit J, Rychlý P, Suchomel V (2014) The sketch engine: ten years on. Lexicography 1(1):7–36
4. Remus S, Biemann C (2016) Domain-specific corpus expansion with focused webcrawling. In: Proceedings of the 10th international conference on language resources and evaluation (LREC'16), pp 3607–3611
5. Dodge J, Sap M, Marasović A, Agnew W, Ilharco G, Groeneveld D, Mitchell M, Gardner M (2021) Documenting large Webtext corpora: a case study on the colossal clean crawled corpus. arXiv preprint arXiv:2104.08758