1. Graph2vid: Flow graph to video grounding for weakly-supervised multi-step localization;dvornik;ArXiv Preprint,2022
2. Mpnet: Masked and permuted pre-training for language understanding;song;Advances in neural information processing systems,2020
3. Action Modifiers: Learning From Adverbs in Instructional Videos
4. Learning to Segment Actions from Visual and Language Instructions via Differentiable Weak Sequence Alignment
5. Unsupervised Procedure Learning via Joint Dynamic Summarization