Datasets for large language models: A comprehensive survey

Y Liu, J Cao, C Liu, K Ding, L ** - arxiv preprint arxiv:2402.18041, 2024 - arxiv.org
This paper embarks on an exploration into the Large Language Model (LLM) datasets,
which play a crucial role in the remarkable advancements of LLMs. The datasets serve as …

A survey on recent approaches for natural language processing in low-resource scenarios

MA Hedderich, L Lange, H Adel, J Strötgen… - arxiv preprint arxiv …, 2020 - arxiv.org
Deep neural networks and huge language models are becoming omnipresent in natural
language applications. As they are known for requiring large amounts of training data, there …

[HTML][HTML] A survey on named entity recognition—datasets, tools, and methodologies

B Jehangir, S Radhakrishnan, R Agarwal - Natural Language Processing …, 2023 - Elsevier
Natural language processing (NLP) is crucial in the current processing of data because it
takes into account many sources, formats, and purposes of data as well as information from …

A review on method entities in the academic literature: Extraction, evaluation, and application

Y Wang, C Zhang, K Li - Scientometrics, 2022 - Springer
In scientific research, the method is an indispensable means to solve scientific problems and
a critical research object. With the advancement of sciences, many scientific methods are …

Leverage lexical knowledge for Chinese named entity recognition via collaborative graph network

D Sui, Y Chen, K Liu, J Zhao, S Liu - Proceedings of the 2019 …, 2019 - aclanthology.org
The lack of word boundaries information has been seen as one of the main obstacles to
develop a high performance Chinese named entity recognition (NER) system. Fortunately …

Noisy-labeled NER with confidence estimation

K Liu, Y Fu, C Tan, M Chen, N Zhang, S Huang… - arxiv preprint arxiv …, 2021 - arxiv.org
Recent studies in deep learning have shown significant progress in named entity
recognition (NER). Most existing works assume clean data annotation, yet a fundamental …

Empirical analysis of unlabeled entity problem in named entity recognition

Y Li, L Liu, S Shi - arxiv preprint arxiv:2012.05426, 2020 - arxiv.org
In many scenarios, named entity recognition (NER) models severely suffer from unlabeled
entity problem, where the entities of a sentence may not be fully annotated. Through …

A pre-training and self-training approach for biomedical named entity recognition

S Gao, O Kotevska, A Sorokine, JB Christian - PloS one, 2021 - journals.plos.org
Named entity recognition (NER) is a key component of many scientific literature mining
tasks, such as information retrieval, information extraction, and question answering; …

Misrobærta: transformers versus misinformation

CO Truică, ES Apostol - Mathematics, 2022 - mdpi.com
Misinformation is considered a threat to our democratic values and principles. The spread of
such content on social media polarizes society and undermines public discourse by …

Ecomgpt-ct: Continual pre-training of e-commerce large language models with semi-structured data

S Ma, S Huang, S Huang, X Wang, Y Li… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) pre-trained on massive corpora have exhibited remarkable
performance on various NLP tasks. However, applying these models to specific domains still …