Affiliation:
1. PSG College of Technology, India
Abstract
Tamil question answering system (QAS) is aimed to find relevant answers in in the native language. The system will help farmers to get information in Tamil related to the agriculture domain. Tamil is one of the morphologically rich languages. As a result, developing such systems that process Tamil words is a difficult task. The list of stop words in Tamil has to be collected manually. Parts of speech (POS) tagging is used to identify suitable POS tag for a sequence of Tamil words. The system employs Hidden Markov Model (HMM)-based viterbi algorithm, a machine learning technique for parts of speech tagging of Tamil words. The analyzed question is given to the Google search to obtain relevant documents. On top of Google search, locality sensitive hashing technique (LSH) is utilized to retrieve the five relevant items for the input Tamil question. Jaccard similarity is used to obtain the response from the retrieved document items. The proposed system is modelled using a dataset of 1000 sentences in the agriculture domain.
Reference6 articles.
1. Dhanalakshmi, V., Kumar, M.A., Soman, K.P., & Rajendran, S. (2009). POS Tagger and Chunker for Tamil Language. Academic Press.
2. Tamil Question Classification Using Morpheme Features
3. Morpheme based Language Model for Part-of-Speech Tagging
4. CRF Models for Tamil Part of Speech Tagging and Chunking
5. Hybrid, Three-stage Named Entity Recognizer for Tamil.;S. L.Pandian;Info,2008