SMS spam filtering: Methods and data

SJ Delany, M Buckley, D Greene - Expert Systems with Applications, 2012 - Elsevier
Mobile or SMS spam is a real and growing problem primarily due to the availability of very
cheap bulk pre-pay SMS packages and the fact that SMS engenders higher response rates …

[PDF][PDF] Lexical normalisation of short text messages: Makn sens a# twitter

B Han, T Baldwin - Proceedings of the 49th annual meeting of the …, 2011 - aclanthology.org
Twitter provides access to large volumes of data in real time, but is notoriously noisy,
hampering its utility for NLP. In this paper, we target out-of-vocabulary words in short text …

A review of shorthand systems: From brachygraphy to microtext and beyond

R Satapathy, E Cambria, A Nanetti, A Hussain - Cognitive Computation, 2020 - Springer
Human civilizations have performed the art of writing across continents and over different
time periods. In order to speed up the writing process, the art of shorthand (brachygraphy) …

Neural models of text normalization for speech applications

H Zhang, R Sproat, AH Ng, F Stahlberg… - Computational …, 2019 - direct.mit.edu
Abstract Machine learning, including neural network techniques, have been applied to
virtually every domain in natural language processing. One problem that has been …

[PDF][PDF] A broad-coverage normalization system for social media language

F Liu, F Weng, X Jiang - Proceedings of the 50th Annual Meeting …, 2012 - aclanthology.org
Social media language contains huge amount and wide variety of nonstandard tokens,
created both intentionally and unintentionally by the users. It is of crucial importance to …

RNN approaches to text normalization: A challenge

R Sproat, N Jaitly - arxiv preprint arxiv:1611.00068, 2016 - arxiv.org
This paper presents a challenge to the community: given a large corpus of written text
aligned to its normalized spoken form, train an RNN to learn the correct normalization …

Phonetic-based microtext normalization for twitter sentiment analysis

R Satapathy, C Guerreiro, I Chaturvedi… - … conference on data …, 2017 - ieeexplore.ieee.org
The proliferation of Web 2.0 technologies and the increasing use of computer-mediated
communication resulted in a new form of written text, termed microtext. This poses new …

[PDF][PDF] Insertion, deletion, or substitution? Normalizing text messages without pre-categorization nor supervision

F Liu, F Weng, B Wang, Y Liu - … of the 49th Annual Meeting of the …, 2011 - aclanthology.org
Most text message normalization approaches are based on supervised learning and rely on
human labeled training data. In addition, the nonstandard words are often categorized into …

[PDF][PDF] Contextual bearing on linguistic variation in social media

S Gouws, D Metzler, C Cai, E Hovy - Proceedings of the workshop …, 2011 - aclanthology.org
Microtexts, like SMS messages, Twitter posts, and Facebook status updates, are a popular
medium for real-time communication. In this paper, we investigate the writing conventions …

[PDF][PDF] A log-linear model for unsupervised text normalization

Y Yang, J Eisenstein - Proceedings of the 2013 conference on …, 2013 - aclanthology.org
We present a unified unsupervised statistical model for text normalization. The relationship
between standard and non-standard tokens is characterized by a log-linear model …