1. Internet Archive. 2021. Archive-It! https://archive-it.org/ Internet Archive. 2021. Archive-It! https://archive-it.org/
2. Internet Archive. 2021. heritrix3. https://github.com/internetarchive/heritrix3 Internet Archive. 2021. heritrix3. https://github.com/internetarchive/heritrix3
3. Niels Brügger . 2018. The Archived Web: Doing History in the Digital Age . MIT Press . Niels Brügger. 2018. The Archived Web: Doing History in the Digital Age. MIT Press.
4. International Internet Preservation Consortium. 2021. The WARC Format. https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1/ International Internet Preservation Consortium. 2021. The WARC Format. https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1/
5. Efficient big data processing in Hadoop MapReduce