[PDF][PDF] Adaptive multilingual sentence boundary disambiguation
As an alternative, this article presents an efficient, trainable system for sentence boundary
disambiguation. The system, called Satz, makes simple estimates of the parts of speech of …
disambiguation. The system, called Satz, makes simple estimates of the parts of speech of …
[PDF][PDF] Thoughts on word and sentence segmentation in Thai
W Aroonmanakun - Proceedings of the Seventh Symposium on …, 2007 - academia.edu
This paper discusses problems of word and sentence segmentation in Thai. Disagreements
on word segmentation are caused mostly from compound words. To set a standard resource …
on word segmentation are caused mostly from compound words. To set a standard resource …
Hypertextsorten: Definition, Struktur, Klassifikation
G Rehm - 2005 - jlupub.ub.uni-giessen.de
Suchmaschinen im WWW indexieren und durchsuchen Dokumente in großer
Geschwindigkeit. Trotz der quantitativ beeindruckenden Ergebnisse lässt dieQualität der …
Geschwindigkeit. Trotz der quantitativ beeindruckenden Ergebnisse lässt dieQualität der …
[KİTAP][B] Practical text mining with Perl
R Bilisoly - 2011 - books.google.com
Provides readers with the methods, algorithms, and means to perform text mining tasks This
book is devoted to the fundamentals of text mining using Perl, an open-source programming …
book is devoted to the fundamentals of text mining using Perl, an open-source programming …
System and method for adaptive automatic error correction
AB Carus, L Lapshina, B Rechea… - US Patent 7,565,282, 2009 - Google Patents
(57) ABSTRACT A method for adaptive automatic error and mismatch correc tion is
disclosed for use with a system having an automatic error and mismatch correction learning …
disclosed for use with a system having an automatic error and mismatch correction learning …
System and method for tokenization of text using classifier models
J Carrier, AB Carus, WF Cote, J Dowd… - US Patent …, 2011 - Google Patents
The present invention pertains to a system and method for the tokenization of text. The
featurizer may be configured to receive input text and convert the input text into tokens …
featurizer may be configured to receive input text and convert the input text into tokens …
An analysis of sentence boundary detection systems for English and Portuguese documents
In this paper we present a study comparing the performance of different systems found in the
literature that perform the task of automatic text segmentation in sentences for English …
literature that perform the task of automatic text segmentation in sentences for English …
[PDF][PDF] Shallow processing of Portuguese: From sentence chunking to nominal lemmatization
JRMF da Silva - 2007 - xisque.di.fc.ul.pt
This dissertation proposes a set of procedures for the computational processing of
Portuguese. Five tasks are covered: Sentence Segmentation, Tokenization, Partof-Speech …
Portuguese. Five tasks are covered: Sentence Segmentation, Tokenization, Partof-Speech …
[PDF][PDF] Constitution et exploitation de bi-textes pour l'Aide à la traduction
P Bernhard - 2001 - turing3.univ-grenoble-alpes.fr
Constitution et exploitation de bi-textes pour l’Aide à la traduction Page 1 UNIVERSITE DE NICE
SOPHIA ANTIPOLIS UFR DE SCIENCES DU LANGAGE THESE Sciences du langage …
SOPHIA ANTIPOLIS UFR DE SCIENCES DU LANGAGE THESE Sciences du langage …
[PDF][PDF] A preliminary look into the use of named entity information for bioscience text tokenization
R Arens - Proceedings of the Student Research Workshop at HLT …, 2004 - aclanthology.org
Tokenization in the bioscience domain is often difficult. New terms, technical terminology,
and nonstandard orthography, all common in bioscience text, contribute to this difficulty. This …
and nonstandard orthography, all common in bioscience text, contribute to this difficulty. This …