Phonological representation: Beyond abstract versus episodic

JB Pierrehumbert - Annual Review of Linguistics, 2016 - annualreviews.org
Phonological representations capture information about individual word forms and about the
general characteristics of word forms in a language. To support the processing of novel word …

[BOOK][B] Modern information retrieval

R Baeza-Yates, B Ribeiro-Neto - 1999 - people.ischool.berkeley.edu
Information retrieval (IR) has changed considerably in recent years with the expansion of the
World Wide Web and the advent of modern and inexpensive graphical user interfaces and …

[PDF][PDF] Tackling the poor assumptions of naive bayes text classifiers

JD Rennie, L Shih, J Teevan, DR Karger - Proceedings of the 20th …, 2003 - cdn.aaai.org
Naive Bayes is often used as a baseline in text classification because it is fast and easy to
implement. Its severe assumptions make such efficiency possible but also adversely affect …

[PDF][PDF] Automatic word sense discrimination

H Schütze - Computational linguistics, 1998 - aclanthology.org
This paper presents context-group discrimination, a disambiguation algorithm based on
clustering. Senses are interpreted as groups (or clusters) of similar contexts of the …

A scaling model for estimating time‐series party positions from texts

JB Slapin, SO Proksch - American Journal of Political Science, 2008 - Wiley Online Library
Recent advances in computational content analysis have provided scholars promising new
ways for estimating party positions. However, existing text‐based methods face challenges …

Comparing corpora

A Kilgarriff - International journal of corpus linguistics, 2001 - jbe-platform.com
Corpus linguistics lacks strategies for describing and comparing corpora. Currently most
descriptions of corpora are textual, and questions such as 'what sort of a corpus is this?', or …

Context-sensitive learning methods for text categorization

WW Cohen, Y Singer - ACM Transactions on Information Systems (TOIS), 1999 - dl.acm.org
Two recently implemented machine-learning algorithms, RIPPER and slee**-experts for
phrases, are evaluated on a number of large text categorization problems. These algorithms …

[BOOK][B] Sprache und Wissen

N Bubenhofer - 2009 - degruyter.com
1 42.15 Human Rights Watch 46 19.76 die Fortsetzung der 2 39.52 und der Opposition 47
16.07 die Sicherheit der 3 15.81 Russland und der 48 15.81 die Freilassung der 4 32.93 der …

Dispersions and adjusted frequencies in corpora

ST Gries - International journal of corpus linguistics, 2008 - jbe-platform.com
The most frequent statistics in corpus linguistics are frequencies of occurrence and
frequencies of co-occurrence of two or more linguistic variables. However, such frequencies …

On the burstiness of visual elements

H Jégou, M Douze, C Schmid - 2009 IEEE conference on …, 2009 - ieeexplore.ieee.org
Burstiness, a phenomenon initially observed in text retrieval, is the property that a given
visual element appears more times in an image than a statistically independent model …