[PDF][PDF] Adaptive multilingual sentence boundary disambiguation

DD Palmer, MA Hearst - Computational linguistics, 1997 - aclanthology.org
As an alternative, this article presents an efficient, trainable system for sentence boundary
disambiguation. The system, called Satz, makes simple estimates of the parts of speech of …

[PDF][PDF] Thoughts on word and sentence segmentation in Thai

W Aroonmanakun - Proceedings of the Seventh Symposium on …, 2007 - academia.edu
This paper discusses problems of word and sentence segmentation in Thai. Disagreements
on word segmentation are caused mostly from compound words. To set a standard resource …

Hypertextsorten: Definition, Struktur, Klassifikation

G Rehm - 2005 - jlupub.ub.uni-giessen.de
Suchmaschinen im WWW indexieren und durchsuchen Dokumente in großer
Geschwindigkeit. Trotz der quantitativ beeindruckenden Ergebnisse lässt dieQualität der …

[KİTAP][B] Practical text mining with Perl

R Bilisoly - 2011 - books.google.com
Provides readers with the methods, algorithms, and means to perform text mining tasks This
book is devoted to the fundamentals of text mining using Perl, an open-source programming …

System and method for adaptive automatic error correction

AB Carus, L Lapshina, B Rechea… - US Patent 7,565,282, 2009 - Google Patents
(57) ABSTRACT A method for adaptive automatic error and mismatch correc tion is
disclosed for use with a system having an automatic error and mismatch correction learning …

System and method for tokenization of text using classifier models

J Carrier, AB Carus, WF Cote, J Dowd… - US Patent …, 2011 - Google Patents
The present invention pertains to a system and method for the tokenization of text. The
featurizer may be configured to receive input text and convert the input text into tokens …

An analysis of sentence boundary detection systems for English and Portuguese documents

CN Silla Jr, CAA Kaestner - … Conference on Intelligent Text Processing and …, 2004 - Springer
In this paper we present a study comparing the performance of different systems found in the
literature that perform the task of automatic text segmentation in sentences for English …

[PDF][PDF] Shallow processing of Portuguese: From sentence chunking to nominal lemmatization

JRMF da Silva - 2007 - xisque.di.fc.ul.pt
This dissertation proposes a set of procedures for the computational processing of
Portuguese. Five tasks are covered: Sentence Segmentation, Tokenization, Partof-Speech …

[PDF][PDF] Constitution et exploitation de bi-textes pour l'Aide à la traduction

P Bernhard - 2001 - turing3.univ-grenoble-alpes.fr
Constitution et exploitation de bi-textes pour l’Aide à la traduction Page 1 UNIVERSITE DE NICE
SOPHIA ANTIPOLIS UFR DE SCIENCES DU LANGAGE THESE Sciences du langage …

[PDF][PDF] A preliminary look into the use of named entity information for bioscience text tokenization

R Arens - Proceedings of the Student Research Workshop at HLT …, 2004 - aclanthology.org
Tokenization in the bioscience domain is often difficult. New terms, technical terminology,
and nonstandard orthography, all common in bioscience text, contribute to this difficulty. This …