SMS spam filtering: Methods and data
Mobile or SMS spam is a real and growing problem primarily due to the availability of very
cheap bulk pre-pay SMS packages and the fact that SMS engenders higher response rates …
cheap bulk pre-pay SMS packages and the fact that SMS engenders higher response rates …
[PDF][PDF] Lexical normalisation of short text messages: Makn sens a# twitter
Twitter provides access to large volumes of data in real time, but is notoriously noisy,
hampering its utility for NLP. In this paper, we target out-of-vocabulary words in short text …
hampering its utility for NLP. In this paper, we target out-of-vocabulary words in short text …
A review of shorthand systems: From brachygraphy to microtext and beyond
Human civilizations have performed the art of writing across continents and over different
time periods. In order to speed up the writing process, the art of shorthand (brachygraphy) …
time periods. In order to speed up the writing process, the art of shorthand (brachygraphy) …
Neural models of text normalization for speech applications
Abstract Machine learning, including neural network techniques, have been applied to
virtually every domain in natural language processing. One problem that has been …
virtually every domain in natural language processing. One problem that has been …
[PDF][PDF] A broad-coverage normalization system for social media language
Social media language contains huge amount and wide variety of nonstandard tokens,
created both intentionally and unintentionally by the users. It is of crucial importance to …
created both intentionally and unintentionally by the users. It is of crucial importance to …
RNN approaches to text normalization: A challenge
This paper presents a challenge to the community: given a large corpus of written text
aligned to its normalized spoken form, train an RNN to learn the correct normalization …
aligned to its normalized spoken form, train an RNN to learn the correct normalization …
Phonetic-based microtext normalization for twitter sentiment analysis
The proliferation of Web 2.0 technologies and the increasing use of computer-mediated
communication resulted in a new form of written text, termed microtext. This poses new …
communication resulted in a new form of written text, termed microtext. This poses new …
[PDF][PDF] Insertion, deletion, or substitution? Normalizing text messages without pre-categorization nor supervision
Most text message normalization approaches are based on supervised learning and rely on
human labeled training data. In addition, the nonstandard words are often categorized into …
human labeled training data. In addition, the nonstandard words are often categorized into …
[PDF][PDF] Contextual bearing on linguistic variation in social media
Microtexts, like SMS messages, Twitter posts, and Facebook status updates, are a popular
medium for real-time communication. In this paper, we investigate the writing conventions …
medium for real-time communication. In this paper, we investigate the writing conventions …
[PDF][PDF] A log-linear model for unsupervised text normalization
We present a unified unsupervised statistical model for text normalization. The relationship
between standard and non-standard tokens is characterized by a log-linear model …
between standard and non-standard tokens is characterized by a log-linear model …