Μελετητής Google

MH Vu, R Akbar, PA Robert, B Swiatczak… - Nature Machine …, 2023 - nature.com

Deep neural-network-based language models (LMs) are increasingly applied to large-scale
protein sequence data to predict protein function. However, being largely black-box models …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 38 Σχετικά άρθρα Όλες οι 4 εκδοχές

[Free GPT-4]

[PDF] arxiv.org

Between words and characters: A brief history of open-vocabulary modeling and tokenization in NLP

SJ Mielke, Z Alyafeai, E Salesky, C Raffel… - arxiv preprint arxiv …, 2021 - arxiv.org

What are the units of text that we want to model? From bytes to multi-word expressions, text
can be analyzed and generated at many granularities. Until recently, most natural language …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 105 Σχετικά άρθρα Όλες οι 5 εκδοχές Προβολή ως HTML

[Free GPT-4]

[PDF] thecvf.com

Llms are good sign language translators

J Gong, LG Foo, Y He… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Sign Language Translation (SLT) is a challenging task that aims to translate sign
videos into spoken language. Inspired by the strong translation capabilities of large …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 24 Σχετικά άρθρα Όλες οι 3 εκδοχές Προβολή ως HTML

[Free GPT-4]

[PDF] arxiv.org

Lbpe: Long-token-first tokenization to improve large language models

H Lian, Y **ong, Z Lin, J Niu, S Mo, H Chen… - arxiv preprint arxiv …, 2024 - arxiv.org

The prevalent use of Byte Pair Encoding (BPE) in Large Language Models (LLMs) facilitates
robust handling of subword units and avoids issues of out-of-vocabulary words. Despite its …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 2 Σχετικά άρθρα Όλες οι 3 εκδοχές Προβολή ως HTML

[Free GPT-4]

[PDF] aclanthology.org

Subword evenness (sue) as a predictor of cross-lingual transfer to low-resource languages

O Pelloni, A Shaitarova… - Proceedings of the 2022 …, 2022 - aclanthology.org

Pre-trained multilingual models, such as mBERT, XLM-R and mT5, are used to improve the
performance on various tasks in low-resource languages via cross-lingual transfer. In this …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 7 Σχετικά άρθρα Όλες οι 3 εκδοχές Προβολή ως HTML

[Free GPT-4]

[PDF] aclanthology.org

Interpreting character embeddings with perceptual representations: The case of shape, sound, and color

S Boldsen, M Agirrezabal… - Proceedings of the 60th …, 2022 - aclanthology.org

Character-level information is included in many NLP models, but evaluating the information
encoded in character representations is an open issue. We leverage perceptual …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 6 Σχετικά άρθρα Όλες οι 2 εκδοχές Προβολή ως HTML

[Free GPT-4]

[PDF] mit.edu

Languages through the looking glass of bpe compression

X Gutierrez-Vasques, C Bentz… - Computational …, 2023 - direct.mit.edu

Byte-pair encoding (BPE) is widely used in NLP for performing subword tokenization. It
uncovers redundant patterns for compressing the data, and hence alleviates the sparsity …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 17 Σχετικά άρθρα Όλες οι 4 εκδοχές

[Free GPT-4]

[PDF] aclanthology.org

Dialect representation learning with neural dialect-to-standard normalization

O Kuparinen, Y Scherrer - Tenth Workshop on NLP for Similar …, 2023 - aclanthology.org

Abstract Language label tokens are often used in multilingual neural language modeling
and sequence-to-sequence learning to enhance the performance of such models. An …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 4 Σχετικά άρθρα Όλες οι 6 εκδοχές Προβολή ως HTML

[Free GPT-4]

[PDF] aclanthology.org

TeDDi sample: Text data diversity sample for language comparison and multilingual NLP

S Moran, C Bentz, X Gutierrez-Vasques… - Proceedings of the …, 2022 - aclanthology.org

We present the TeDDi sample, a diversity sample of text data for language comparison and
multilingual Natural Language Processing. The TeDDi sample currently features 89 …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 10 Σχετικά άρθρα Όλες οι 5 εκδοχές Προβολή ως HTML

[Free GPT-4]

[PDF] arxiv.org

Are you talking to ['xem'] or ['x','em']? On Tokenization and Addressing Misgendering in LLMs with Pronoun Tokenization Parity

A Ovalle, N Mehrabi, P Goyal, J Dhamala… - arxiv preprint arxiv …, 2023 - arxiv.org

A large body of NLP research has documented the ways gender biases manifest and
amplify within large language models (LLMs), though this research has predominantly …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 9 Σχετικά άρθρα Όλες οι 3 εκδοχές Προβολή ως HTML

Δημιουργία ειδοποίησης

Παράθεση

Σύνθετη αναζήτηση

Αποθηκεύτηκε στη Βιβλιοθήκη μου

From characters to words: the turning point of BPE merges

Linguistically inspired roadmap for building biologically reliable protein language models

Between words and characters: A brief history of open-vocabulary modeling and tokenization in NLP

Llms are good sign language translators

Lbpe: Long-token-first tokenization to improve large language models

Subword evenness (sue) as a predictor of cross-lingual transfer to low-resource languages

Interpreting character embeddings with perceptual representations: The case of shape, sound, and color

Languages through the looking glass of bpe compression

Dialect representation learning with neural dialect-to-standard normalization

TeDDi sample: Text data diversity sample for language comparison and multilingual NLP

Are you talking to ['xem'] or ['x','em']? On Tokenization and Addressing Misgendering in LLMs with Pronoun Tokenization Parity