[КНИГА][B] Text data mining

C Zong, R **a, J Zhang - 2021 - Springer
With the rapid development and popularization of Internet and mobile communication
technologies, text data mining has attracted much attention. In particular, with the wide use …

Translate meanings, not just words: Idiomkb's role in optimizing idiomatic translation with language models

S Li, J Chen, S Yuan, X Wu, H Yang, S Tao… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
To translate well, machine translation (MT) systems and general-purposed language models
(LMs) need a deep understanding of both source and target languages and cultures …

Gated word-character recurrent language model

Y Miyamoto, K Cho - arxiv preprint arxiv:1606.01700, 2016 - arxiv.org
We introduce a recurrent neural network language model (RNN-LM) with long short-term
memory (LSTM) units that utilizes both character-level and word-level inputs. Our model has …

[PDF][PDF] Multi-granularity chinese word embedding

R Yin, Q Wang, P Li, R Li, B Wang - Proceedings of the 2016 …, 2016 - aclanthology.org
This paper considers the problem of learning Chinese word embeddings. In contrast to
English, a Chinese word is usually composed of characters, and most of the characters …

AStitchInLanguageModels: Dataset and methods for the exploration of idiomaticity in pre-trained language models

HT Madabushi, E Gow-Smith, C Scarton… - arxiv preprint arxiv …, 2021 - arxiv.org
Despite their success in a variety of NLP tasks, pre-trained language models, due to their
heavy reliance on compositionality, fail in effectively capturing the meanings of multiword …

Phrase2Vec: phrase embedding based on parsing

Y Wu, S Zhao, W Li - Information Sciences, 2020 - Elsevier
Text is one of the most common unstructured data, and usually, the most primary task in text
mining is to transfer the text into a structured representation. However, the existing text …

Community answer generation based on knowledge graph

Y Wu, S Zhao - Information Sciences, 2021 - Elsevier
Abstract Community Question Answering (CQA) has become an indispensable way for
modern people to share and acquire knowledge. It allows users to ask questions, which will …

Getting BART to ride the idiomatic train: Learning to represent idiomatic expressions

Z Zeng, S Bhat - Transactions of the Association for Computational …, 2022 - direct.mit.edu
Idiomatic expressions (IEs), characterized by their non-compositionality, are an important
part of natural language. They have been a classical challenge to NLP, including pre-trained …

Joint semantic synthesis and morphological analysis of the derived word

R Cotterell, H Schütze - Transactions of the Association for …, 2018 - direct.mit.edu
Much like sentences are composed of words, words themselves are composed of smaller
units. For example, the English word questionably can be analyzed as question+ able+ ly …

Phrase embedding learning from internal and external information based on autoencoder

R Li, Q Yu, S Huang, L Shen, C Wei, X Sun - Information Processing & …, 2021 - Elsevier
Phrase embedding can improve the performance of multiple NLP tasks. Most of the previous
phrase-embedding methods that only use the external or internal semantic information of …