Abstract
We design a new technique for the distributional semantic modeling with a neural network-based approach to learn distributed term representations (or term embeddings) – term vector space models as a result, inspired by the recent ontology-related approach (using different types of contextual knowledge such as syntactic knowledge, terminological knowledge, semantic knowledge, etc.) to the identification of terms (term extraction) and relations between them (relation extraction) called semantic pre-processing technology – SPT. Our method relies on automatic term extraction from the natural language texts and subsequent formation of the problem-oriented or application-oriented (also deeply annotated) text corpora where the fundamental entity is the term (includes non-compositional and compositional terms). This gives us an opportunity to changeover from distributed word representations (or word embeddings) to distributed term representations (or term embeddings). The main practical result of our work is the development kit (set of toolkits represented as web service APIs and web application), which provides all necessary routines for the basic linguistic pre-processing and the semantic pre-processing of the natural language texts in Ukrainian for future training of term vector space models.
Publisher
National Academy of Sciences of Ukraine (Co. LTD Ukrinformnauka)
Reference53 articles.
1. 1. Turney P.D. & Pantel P. From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research. 2010. 37(1). P. 141-188.
2. 2. Ganegedara T. 2018. Natural Language Processing with TensorFlow: Teach language to machines using Python's deep learning library. Packt Publishing Ltd.
3. 3. Kutuzov A. & Andreev I.A. 2015. Texts in, meaning out: neural language models in semantic similarity task for Russian. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual Conference "Dialogue". Moscow, May 27 - 30. Moscow: RGGU. Issue 14 (21).
4. 4. Kutuzov A. 2014. Semantic clustering of Russian web search results: possibilities and problems. In Russian Summer School in Information Retrieval. Aug 18-22. Cham: Springer. P. 320-331.
5. 5. Sienčnik S.K. Adapting word2vec to named entity recognition. In: Proceedings of the 20th Nordic conference of computational linguistics. Nodalida, May 11-13. Vilnius: Linköping University Electronic Press. 2015. N 109. P. 239-243.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献