Affiliation:
1. Department of Industrial Engineering and Engineering Management, National Tsing Hua University, Taiwan
2. Department of Management Science, National Chiao Tung University, Taiwan
3. School of Big Data Management, Soochow University, Taiwan
Abstract
Technical or knowledge documents, such as research papers, patents, and technical documents, e.g., request for quotations (RFQ), are important knowledge references for multiple purposes. For example, enterprises and R&D institutions often need to conduct literature and patent searches and analyses before, during, and after R&D and commercialization. These knowledge discovery processes help them identify prior arts related to the current R&D efforts to avoid duplicating research efforts or infringing upon existing intellectual property rights (IPRs). It is common to have many synonyms (i.e., words and phrases with near-identical meanings) appeared in documents, which may hinder search results, if queries do not consider these synonyms. For instance, conducting “freedom-to-operate” (FTO) patent search may not find all related patents if synonyms were not taking into consideration. This research develops methodologies of generating domain specific “word” and “phrase” synonym dictionaries using machine learning. The generation and validation of both domain-specific “word” and “phrase” synonym dictionaries are conducted using more than 2000 solar power related patents as testing document set. The testing result shows that, in the solar power domain, both word level and phrase level dictionaries identify synonyms effectively and, thus, significantly improve the patent search results.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Identification Semantic Text of Indonesian Medical Terms from Question-Answer Data;2022 6th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE);2022-12-13