A survey of available corpora for building data-driven dialogue systems

IV Serban, R Lowe, P Henderson, L Charlin… - arxiv preprint arxiv …, 2015 - arxiv.org
During the past decade, several areas of speech and language understanding have
witnessed substantial breakthroughs from the use of data-driven models. In the area of …

The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis

S Alam, N Yao - Computational and Mathematical Organization Theory, 2019 - Springer
Big data and its related technologies have become active areas of research recently. There
is a huge amount of data generated every minute and second that includes unstructured …

[PDF][PDF] A phrase-based statistical model for SMS text normalization

AT Aw, M Zhang, J **ao, J Su - … of the COLING/ACL 2006 Main …, 2006 - aclanthology.org
Abstract Short Messaging Service (SMS) texts behave quite differently from normal written
texts and have some very special phenomena. To translate SMS texts, traditional …

Text normalization in social media: progress, problems and applications for a pre-processing system of casual English

E Clark, K Araki - Procedia-Social and Behavioral Sciences, 2011 - Elsevier
The rapid expansion in user-generated content on the Web of the 2000s, characterized by
social media, has led to Web content featuring somewhat less standardized language than …

Opinion mining from noisy text data

L Dey, SKM Haque - Proceedings of the second workshop on Analytics …, 2008 - dl.acm.org
The proliferation of Internet has not only generated huge volumes of unstructured
information in the form of web documents, but a large amount of text is also generated in the …

Reprint of: Computational approaches for mining user's opinions on the Web 2.0

G Petz, M Karpowicz, H Fürschuß, A Auinger… - Information Processing …, 2015 - Elsevier
The emerging research area of opinion mining deals with computational methods in order to
find, extract and systematically analyze people's opinions, attitudes and emotions towards …

Sentiment classification for Indonesian message in social media

AR Naradhipa, A Purwarianti - 2012 International Conference …, 2012 - ieeexplore.ieee.org
Nowadays, classifying sentiment from social media has been a strategic thing since people
can express their feeling about something in an easy way and short text. Mining opinion …

[BOOK][B] Web corpus construction

R Schäfer, F Bildhauer - 2013 - books.google.com
The World Wide Web constitutes the largest existing source of texts written in a great variety
of languages. A feasible and sound way of exploiting this data for linguistic research is to …

Email data cleaning

J Tang, H Li, Y Cao, Z Tang - Proceedings of the eleventh ACM SIGKDD …, 2005 - dl.acm.org
Addressed in this paper is the issue of'email data cleaning'for text mining. Many text mining
applications need take emails as input. Email data is usually noisy and thus it is necessary …

A model of preprocessing for social media data extraction

DZ Abidin, S Nurmaini, RF Malik… - 2019 International …, 2019 - ieeexplore.ieee.org
Tropical disease grows fast and requires detection. One source of data for detections is
social media Twitter. However, social media data has data with diverse data structures …