[PDF][PDF] Aksharantar: Towards building open transliteration tools for the next billion users
We introduce Aksharantar, the largest publicly available transliteration dataset for 21 Indic
languages containing 26 million transliteration pairs. We build this dataset by mining …
languages containing 26 million transliteration pairs. We build this dataset by mining …
[PDF][PDF] Grapheme-to-phoneme models for (almost) any language
Abstract Grapheme-to-phoneme (g2p) models are rarely available in low-resource
languages, as the creation of training and evaluation data is expensive and time-consuming …
languages, as the creation of training and evaluation data is expensive and time-consuming …
Leveraging orthographic similarity for multilingual neural transliteration
We address the task of joint training of transliteration models for multiple language pairs
(multilingual transliteration). This is an instance of multitask learning, where individual tasks …
(multilingual transliteration). This is an instance of multitask learning, where individual tasks …
A large-scale evaluation of neural machine transliteration for Indic languages
We take up the task of large-scale evaluation of neural machine transliteration between
English and Indic languages, with a focus on multilingual transliteration to utilize …
English and Indic languages, with a focus on multilingual transliteration to utilize …
XLEnt: Mining a large cross-lingual entity dataset with lexical-semantic-phonetic word alignment
Cross-lingual named-entity lexica are an important resource to multilingual NLP tasks such
as machine translation and cross-lingual wikification. While knowledge bases contain a …
as machine translation and cross-lingual wikification. While knowledge bases contain a …
Learning better name translation for cross-lingual wikification
A notable challenge in cross-lingual wikification is the problem of retrieving English
Wikipedia title candidates given a non-English mention, a step that requires translating …
Wikipedia title candidates given a non-English mention, a step that requires translating …
[PDF][PDF] Report of NEWS 2016 machine transliteration shared task
This report presents the results from the Machine Transliteration Shared Task conducted as
part of The Sixth Named Entities Workshop (NEWS 2016) held at ACL 2016in Berlin …
part of The Sixth Named Entities Workshop (NEWS 2016) held at ACL 2016in Berlin …
[PDF][PDF] Assamese Back Transliteration-An Empirical Study Over Canonical and Non-canonical Datasets
This study evaluates the performance of transformer-based state-of-the-art machine
transliteration systems in processing noisy transliterated texts (non-canonical form) from …
transliteration systems in processing noisy transliterated texts (non-canonical form) from …
Burmese (Myanmar) name romanization: A sub-syllabic segmentation scheme for statistical solutions
We focus on Burmese name Romanization, a critical task in the translation of Burmese into
languages using Latin script. As Burmese is under researched and not well resourced, we …
languages using Latin script. As Burmese is under researched and not well resourced, we …
Neural machine transliteration: Preliminary results
AH Jadidinejad - arxiv preprint arxiv:1609.04253, 2016 - arxiv.org
Machine transliteration is the process of automatically transforming the script of a word from
a source language to a target language, while preserving pronunciation. Sequence to …
a source language to a target language, while preserving pronunciation. Sequence to …