Tamil Paraphrase Detection Using Encoder-Decoder Neural Networks
Abstract
Detecting paraphrases in Indian languages require critical analysis on the lexical, syntactic and semantic features. Since the structure of Indian languages differ from the other languages like English, the usage of lexico-syntactic features vary between the Indian languages and plays a critical role in determining the performance of the system. Instead of using various lexico-syntactic similarity features, we aim to apply a complete end-to-end system using deep learning networks with no lexico-syntactic features. In this paper we exploited the encoder-decoder model of deep neural network to analyze the paraphrase sentences in Tamil language and to classify. In this encoder-decoder model, LSTM, GRU units and gNMT are used as layers along with attention mechanism. Using this end-to-end model, there is an increase in f1-measure by 0.5% for the subtask-1 when compared to the state-of-the-art systems. The system was trained and evaluated on DPIL@FIRE2016 Shared Task dataset. To our knowledge, ours is the first deep learning model which validates the training instances of both the subtask-1 and subtask-2 dataset of DPIL shared task.
Origin | Files produced by the author(s) |
---|