ParaNMT-50M: Pushing the limits of paraphrastic sentence embeddings with millions of machine translations
We describe PARANMT-50M, a dataset of more than 50 million English-English sentential
paraphrase pairs. We generated the pairs automatically by using neural machine translation …
paraphrase pairs. We generated the pairs automatically by using neural machine translation …
[PDF][PDF] Multi-perspective sentence similarity modeling with convolutional neural networks
Modeling sentence similarity is complicated by the ambiguity and variability of linguistic
expression. To cope with these challenges, we propose a model for comparing sentences …
expression. To cope with these challenges, we propose a model for comparing sentences …
[PDF][PDF] That's so annoying!!!: A lexical and frame-semantic embedding based data augmentation approach to automatic categorization of annoying behaviors using# …
We propose a novel data augmentation approach to enhance computational behavioral
analysis using social media text. In particular, we collect a Twitter corpus of the descriptions …
analysis using social media text. In particular, we collect a Twitter corpus of the descriptions …
[PDF][PDF] Pairwise word interaction modeling with deep neural networks for semantic similarity measurement
Textual similarity measurement is a challenging problem, as it requires understanding the
semantics of input sentences. Most previous neural network models use coarse-grained …
semantics of input sentences. Most previous neural network models use coarse-grained …
Paraphrasing revisited with neural machine translation
Recognizing and generating paraphrases is an important component in many natural
language processing applications. A well-established technique for automatically extracting …
language processing applications. A well-established technique for automatically extracting …
A continuously growing dataset of sentential paraphrases
A major challenge in paraphrase research is the lack of parallel corpora. In this paper, we
present a new method to collect large-scale sentential paraphrases from Twitter by linking …
present a new method to collect large-scale sentential paraphrases from Twitter by linking …
Neural network models for paraphrase identification, semantic textual similarity, natural language inference, and question answering
In this paper, we analyze several neural network designs (and their variations) for sentence
pair modeling and compare their performance extensively across eight datasets, including …
pair modeling and compare their performance extensively across eight datasets, including …
The bq corpus: A large-scale domain-specific chinese corpus for sentence semantic equivalence identification
This paper introduces the Bank Question (BQ) corpus, a Chinese corpus for sentence
semantic equivalence identification (SSEI). The BQ corpus contains 120,000 question pairs …
semantic equivalence identification (SSEI). The BQ corpus contains 120,000 question pairs …
Multiple instance learning networks for fine-grained sentiment analysis
We consider the task of fine-grained sentiment analysis from the perspective of multiple
instance learning (MIL). Our neural model is trained on document sentiment labels, and …
instance learning (MIL). Our neural model is trained on document sentiment labels, and …
A deep network model for paraphrase detection in short text messages
This paper is concerned with paraphrase detection, ie, identifying sentences that are
semantically identical. The ability to detect similar sentences written in natural language is …
semantically identical. The ability to detect similar sentences written in natural language is …