1. The ClueWeb09 dataset.
http://lemurproject.org/clueweb09/
2. The ClueWeb12 dataset.
http://lemurproject.org/clueweb12/
3. List of lists of lists.
https://en.wikipedia.org/wiki/List_of_lists_of_lists
4. Maria Sharapova.
https://en.wikipedia.org/wiki/Maria_Sharapova
5. The New York Times Annotated Corpus.
https://catalog.ldc.upenn.edu/LDC2008T19