- Academic Search

A Conneau, G Lample - Advances in neural information …, 2019 - proceedings.neurips.cc

Recent studies have demonstrated the efficiency of generative pretraining for English
natural language understanding. In this work, we extend this approach to multiple …

Salva Cita Citato da 1716 Articoli correlati Tutte e 4 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

XNLI: Evaluating cross-lingual sentence representations

A Conneau, G Lample, R Rinott, A Williams… - arxiv preprint arxiv …, 2018 - arxiv.org

State-of-the-art natural language processing systems rely on supervision in the form of
annotated data to learn competent models. These models are generally trained on data in a …

Salva Cita Citato da 1441 Articoli correlati Tutte e 6 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Learning word vectors for 157 languages

E Grave, P Bojanowski, P Gupta, A Joulin… - arxiv preprint arxiv …, 2018 - arxiv.org

Distributed word representations, or word vectors, have recently been applied to many tasks
in natural language processing, leading to state-of-the-art performance. A key ingredient to …

Salva Cita Citato da 1961 Articoli correlati Tutte e 9 le versioni Versione HTML

[Free GPT-4]

[PDF] aclanthology.org

Unicoder: A universal language encoder by pre-training with multiple cross-lingual tasks

H Huang, Y Liang, N Duan, M Gong, L Shou… - arxiv preprint arxiv …, 2019 - arxiv.org

We present Unicoder, a universal language encoder that is insensitive to different
languages. Given an arbitrary NLP task, a model can be trained with Unicoder using training …

Salva Cita Citato da 236 Articoli correlati Tutte e 4 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Fast wordpiece tokenization

X Song, A Salcianu, Y Song, D Dopson… - arxiv preprint arxiv …, 2020 - arxiv.org

Tokenization is a fundamental preprocessing step for almost all NLP tasks. In this paper, we
propose efficient algorithms for the WordPiece tokenization used in BERT, from single-word …

Salva Cita Citato da 194 Articoli correlati Tutte e 5 le versioni Versione HTML

[LIBRO][B] Handbook of natural language processing

N Indurkhya, FJ Damerau - 2010 - taylorfrancis.com

The Handbook of Natural Language Processing, Second Edition presents practical tools
and techniques for implementing natural language processing in computer systems. Along …

Salva Cita Citato da 1033 Articoli correlati Tutte e 5 le versioni Ricerca biblioteche Versione HTML

[Free GPT-4]

[PDF] aaai.org

Cross-lingual natural language generation via pre-training

Z Chi, L Dong, F Wei, W Wang, XL Mao… - Proceedings of the AAAI …, 2020 - ojs.aaai.org

In this work we focus on transferring supervision signals of natural language generation
(NLG) tasks between multiple languages. We propose to pretrain the encoder and the …

Salva Cita Citato da 149 Articoli correlati Tutte e 8 le versioni Versione HTML

[Free GPT-4]

[HTML] nih.gov

Mining quality phrases from massive text corpora

J Liu, J Shang, C Wang, X Ren, J Han - Proceedings of the 2015 ACM …, 2015 - dl.acm.org

Text data are ubiquitous and play an essential role in big data applications. However, text
data are mostly unstructured. Transforming unstructured text into structured units (eg …

Salva Cita Citato da 280 Articoli correlati Tutte e 20 le versioni

[Free GPT-4]

[PDF] mdpi.com

Malbertv2: Code aware bert-based model for malware identification

A Rahali, MA Akhloufi - Big Data and Cognitive Computing, 2023 - mdpi.com

To proactively mitigate malware threats, cybersecurity tools, such as anti-virus and anti-
malware software, as well as firewalls, require frequent updates and proactive …

Salva Cita Citato da 34 Articoli correlati Tutte e 2 le versioni Copia cache

[Free GPT-4]

[PDF] arxiv.org

Bi-directional LSTM recurrent neural network for Chinese word segmentation

Y Yao, Z Huang - … : 23rd International Conference, ICONIP 2016, Kyoto …, 2016 - Springer

Recurrent neural network (RNN) has been broadly applied to natural language process
(NLP) problems. This kind of neural network is designed for modeling sequential data and …

Salva Cita Citato da 165 Articoli correlati Tutte e 3 le versioni

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

Optimizing Chinese word segmentation for machine translation performance

Cross-lingual language model pretraining

XNLI: Evaluating cross-lingual sentence representations

Learning word vectors for 157 languages

Unicoder: A universal language encoder by pre-training with multiple cross-lingual tasks

Fast wordpiece tokenization

[LIBRO][B] Handbook of natural language processing

Cross-lingual natural language generation via pre-training

Mining quality phrases from massive text corpora

Malbertv2: Code aware bert-based model for malware identification

Bi-directional LSTM recurrent neural network for Chinese word segmentation