1. An E, Ji A, Ng E (2019) Large scale video classification using both visual and audio features on youtube-8 m dataset
2. Browne P, Czirjek C, Gurrin C, Jarina R, Lee H, Marlow S, McDonald K, Murphy N, O’Connor N E, Smeaton A F et al (2002) Dublin city university video track experiments for trec 2002. In: The Eleventh Text Retrieval Conference. NIST
3. Chaisorn L, Chua T-S, Koh C-K, Zhao Y, Xu H, Feng H, Tian Q (2003) A two-level multi-modal approach for story segmentation of large news video corpus. In: TRECVID conference,(gaithersburg, washington dc, november 2003). published on-line at http://www.nlpir.nist.gov/projects/tvpubs/tv.pubs.org.html
4. Charlet D, Damnati G, Bouchekif A, Douib A (2015) Fusion of speaker and lexical information for topic segmentation: A co-segmentation approach. In: International Conference on Acoustics, Speech and Signal Processing. IEEE, pp 5261–5265
5. Chatzis S P, Demiris Y (2013) The infinite-order conditional random field model for sequential data modeling. IEEE Trans Pattern Anal Mach Intell 35 (6):1523–1534