[PDF][PDF] Aksharantar: Towards building open transliteration tools for the next billion users

Y Madhani, S Parthan, P Bedekar, R Khapra… - arxiv preprint arxiv …, 2022 - academia.edu
We introduce Aksharantar, the largest publicly available transliteration dataset for 21 Indic
languages containing 26 million transliteration pairs. We build this dataset by mining …

[PDF][PDF] Grapheme-to-phoneme models for (almost) any language

A Deri, K Knight - Proceedings of the 54th Annual Meeting of the …, 2016 - aclanthology.org
Abstract Grapheme-to-phoneme (g2p) models are rarely available in low-resource
languages, as the creation of training and evaluation data is expensive and time-consuming …

Leveraging orthographic similarity for multilingual neural transliteration

A Kunchukuttan, M Khapra, G Singh… - Transactions of the …, 2018 - direct.mit.edu
We address the task of joint training of transliteration models for multiple language pairs
(multilingual transliteration). This is an instance of multitask learning, where individual tasks …

A large-scale evaluation of neural machine transliteration for Indic languages

A Kunchukuttan, S Jain, R Kejriwal - … of the 16th Conference of the …, 2021 - aclanthology.org
We take up the task of large-scale evaluation of neural machine transliteration between
English and Indic languages, with a focus on multilingual transliteration to utilize …

XLEnt: Mining a large cross-lingual entity dataset with lexical-semantic-phonetic word alignment

A El-Kishky, A Renduchintala, J Cross… - arxiv preprint arxiv …, 2021 - arxiv.org
Cross-lingual named-entity lexica are an important resource to multilingual NLP tasks such
as machine translation and cross-lingual wikification. While knowledge bases contain a …

Learning better name translation for cross-lingual wikification

CT Tsai, D Roth - Proceedings of the AAAI Conference on Artificial …, 2018 - ojs.aaai.org
A notable challenge in cross-lingual wikification is the problem of retrieving English
Wikipedia title candidates given a non-English mention, a step that requires translating …

[PDF][PDF] Report of NEWS 2016 machine transliteration shared task

X Duan, RE Banchs, M Zhang, H Li… - Proceedings of the …, 2016 - aclanthology.org
This report presents the results from the Machine Transliteration Shared Task conducted as
part of The Sixth Named Entities Workshop (NEWS 2016) held at ACL 2016in Berlin …

[PDF][PDF] Assamese Back Transliteration-An Empirical Study Over Canonical and Non-canonical Datasets

H Baruah, SR Singh, P Sarmah - Proceedings of the 37th Pacific …, 2023 - aclanthology.org
This study evaluates the performance of transformer-based state-of-the-art machine
transliteration systems in processing noisy transliterated texts (non-canonical form) from …

Burmese (Myanmar) name romanization: A sub-syllabic segmentation scheme for statistical solutions

C Ding, WP Pa, M Utiyama, E Sumita - Computational Linguistics: 15th …, 2018 - Springer
We focus on Burmese name Romanization, a critical task in the translation of Burmese into
languages using Latin script. As Burmese is under researched and not well resourced, we …

Neural machine transliteration: Preliminary results

AH Jadidinejad - arxiv preprint arxiv:1609.04253, 2016 - arxiv.org
Machine transliteration is the process of automatically transforming the script of a word from
a source language to a target language, while preserving pronunciation. Sequence to …