1. Rosyiq, Ahmad and Hayah, Aina Rahmah and Hidayanto, Achmad Nizar and Naisuty, Meisuchi and Suhanto, Agus and Budi, Nur Fitriah Avuning (2019) Information extraction from Twitter using DBpedia ontology: Indonesia tourism places. IEEE, 91--96, 2019 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS)
2. Li, Xiaoli and Liu, Bing (2003) Learning to classify texts using positive and unlabeled data. Citeseer, 587--592, 2003, 3, IJCAI
3. Ghosh, Iman. Ranked: The 100 most spoken languages around the world. Feb, 2020, Iman Ghosh, Visual Capitalist, https://www.visualcapitalist.com/100-most-spoken-languages/
4. Wang, Chenguang and Liu, Xiao and Chen, Zui and Hong, Haoyun and Tang, Jie and Song, Dawn (2021) Zero-Shot Information Extraction as a Unified Text-to-Triple Translation. 1225--1238, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
5. Suan Lim, Ee and Qi Leong, Wei and Thanh Nguyen, Ngan and Adhista, Dea and Ming Kng, Wei and Chandra Tjh, William and Purwarianti, Ayu (2023) {ICON}: Building a Large-Scale Benchmark Constituency Treebank for the {I}ndonesian Language. Association for Computational Linguistics, Washington, D.C., Constituency parsing is an important task of informing how words are combined to form sentences. While constituency parsing in English has seen significant progress in the last few years, tools for constituency parsing in Indonesian remain few and far between. In this work, we publish ICON (Indonesian CONstituency treebank), the hitherto largest publicly-available manually-annotated benchmark constituency treebank for the Indonesian language with a size of 10,000 sentences and approximately 124,000 constituents and 182,000 tokens, which can support the training of state-of-the-art transformer-based models. We establish strong baselines on the ICON dataset using the Berkeley Neural Parser with transformer-based pre-trained embeddings, with the best performance of 88.85{%} F1 score coming from our own version of SpanBERT (IndoSpanBERT). We further analyze the predictions made by our best-performing model to reveal certain idiosyncrasies in the Indonesian language that pose challenges for constituency parsing., 37--53, https://aclanthology.org/2023.tlt-1.5, March, Proceedings of the 21st International Workshop on Treebanks and Linguistic Theories (TLT, GURT/SyntaxFest 2023), Dakota, Daniel and Evang, Kilian and K{\"u}bler, Sandra and Levin, Lori