Automated identification of media bias in news articles: an interdisciplinary literature review

F Hamborg, K Donnay, B Gipp - International Journal on Digital Libraries, 2019 - Springer
Media bias, ie, slanted news coverage, can strongly impact the public perception of the
reported topics. In the social sciences, research over the past decades has developed …

Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation

J Chen, S **ao, P Zhang, K Luo, D Lian… - arxiv preprint arxiv …, 2024 - arxiv.org
In this paper, we present a new embedding model, called M3-Embedding, which is
distinguished for its versatility in Multi-Linguality, Multi-Functionality, and Multi-Granularity. It …

Sheared llama: Accelerating language model pre-training via structured pruning

M **a, T Gao, Z Zeng, D Chen - arxiv preprint arxiv:2310.06694, 2023 - arxiv.org
The popularity of LLaMA (Touvron et al., 2023a; b) and other recently emerged moderate-
sized large language models (LLMs) highlights the potential of building smaller yet powerful …

One embedder, any task: Instruction-finetuned text embeddings

H Su, W Shi, J Kasai, Y Wang, Y Hu… - arxiv preprint arxiv …, 2022 - arxiv.org
We introduce INSTRUCTOR, a new method for computing text embeddings given task
instructions: every text input is embedded together with instructions explaining the use case …

Large language model unlearning via embedding-corrupted prompts

C Liu, Y Wang, J Flanigan, Y Liu - Advances in Neural …, 2025 - proceedings.neurips.cc
Large language models (LLMs) have advanced to encompass extensive knowledge across
diverse domains. Yet controlling what a large language model should not know is important …

Roberta: A robustly optimized bert pretraining approach

Y Liu, M Ott, N Goyal, J Du, M Joshi, D Chen… - arxiv preprint arxiv …, 2019 - arxiv.org
Language model pretraining has led to significant performance gains but careful
comparison between different approaches is challenging. Training is computationally …

Nomic embed: Training a reproducible long context text embedder

Z Nussbaum, JX Morris, B Duderstadt… - arxiv preprint arxiv …, 2024 - arxiv.org
This technical report describes the training of nomic-embed-text-v1, the first fully
reproducible, open-source, open-weights, open-data, 8192 context length English text …

Task-aware retrieval with instructions

A Asai, T Schick, P Lewis, X Chen, G Izacard… - arxiv preprint arxiv …, 2022 - arxiv.org
We study the problem of retrieval with instructions, where users of a retrieval system
explicitly describe their intent along with their queries. We aim to develop a general-purpose …

Investigating the effectiveness of task-agnostic prefix prompt for instruction following

S Ye, H Hwang, S Yang, H Yun, Y Kim… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
In this paper, we present our finding that prepending a Task-Agnostic Prefix Prompt (TAPP)
to the input improves the instruction-following ability of various Large Language Models …

Towards continual knowledge learning of language models

J Jang, S Ye, S Yang, J Shin, J Han, G Kim… - arxiv preprint arxiv …, 2021 - arxiv.org
Large Language Models (LMs) are known to encode world knowledge in their parameters
as they pretrain on a vast amount of web corpus, which is often utilized for performing …