[КНИГА][B] Text data mining
With the rapid development and popularization of Internet and mobile communication
technologies, text data mining has attracted much attention. In particular, with the wide use …
technologies, text data mining has attracted much attention. In particular, with the wide use …
Translate meanings, not just words: Idiomkb's role in optimizing idiomatic translation with language models
To translate well, machine translation (MT) systems and general-purposed language models
(LMs) need a deep understanding of both source and target languages and cultures …
(LMs) need a deep understanding of both source and target languages and cultures …
Gated word-character recurrent language model
Y Miyamoto, K Cho - arxiv preprint arxiv:1606.01700, 2016 - arxiv.org
We introduce a recurrent neural network language model (RNN-LM) with long short-term
memory (LSTM) units that utilizes both character-level and word-level inputs. Our model has …
memory (LSTM) units that utilizes both character-level and word-level inputs. Our model has …
[PDF][PDF] Multi-granularity chinese word embedding
This paper considers the problem of learning Chinese word embeddings. In contrast to
English, a Chinese word is usually composed of characters, and most of the characters …
English, a Chinese word is usually composed of characters, and most of the characters …
AStitchInLanguageModels: Dataset and methods for the exploration of idiomaticity in pre-trained language models
Despite their success in a variety of NLP tasks, pre-trained language models, due to their
heavy reliance on compositionality, fail in effectively capturing the meanings of multiword …
heavy reliance on compositionality, fail in effectively capturing the meanings of multiword …
Phrase2Vec: phrase embedding based on parsing
Y Wu, S Zhao, W Li - Information Sciences, 2020 - Elsevier
Text is one of the most common unstructured data, and usually, the most primary task in text
mining is to transfer the text into a structured representation. However, the existing text …
mining is to transfer the text into a structured representation. However, the existing text …
Community answer generation based on knowledge graph
Y Wu, S Zhao - Information Sciences, 2021 - Elsevier
Abstract Community Question Answering (CQA) has become an indispensable way for
modern people to share and acquire knowledge. It allows users to ask questions, which will …
modern people to share and acquire knowledge. It allows users to ask questions, which will …
Getting BART to ride the idiomatic train: Learning to represent idiomatic expressions
Idiomatic expressions (IEs), characterized by their non-compositionality, are an important
part of natural language. They have been a classical challenge to NLP, including pre-trained …
part of natural language. They have been a classical challenge to NLP, including pre-trained …
Joint semantic synthesis and morphological analysis of the derived word
Much like sentences are composed of words, words themselves are composed of smaller
units. For example, the English word questionably can be analyzed as question+ able+ ly …
units. For example, the English word questionably can be analyzed as question+ able+ ly …
Phrase embedding learning from internal and external information based on autoencoder
R Li, Q Yu, S Huang, L Shen, C Wei, X Sun - Information Processing & …, 2021 - Elsevier
Phrase embedding can improve the performance of multiple NLP tasks. Most of the previous
phrase-embedding methods that only use the external or internal semantic information of …
phrase-embedding methods that only use the external or internal semantic information of …