A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - International Journal of …, 2024 - Springer
Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …

[HTML][HTML] Empowering biomedical discovery with AI agents

S Gao, A Fang, Y Huang, V Giunchiglia, A Noori… - Cell, 2024 - cell.com
We envision" AI scientists" as systems capable of skeptical learning and reasoning that
empower biomedical research through collaborative agents that integrate AI models and …

[PDF][PDF] A survey of large language models

WX Zhao, K Zhou, J Li, T Tang… - arxiv preprint arxiv …, 2023 - paper-notes.zhjwpku.com
Ever since the Turing Test was proposed in the 1950s, humans have explored the mastering
of language intelligence by machine. Language is essentially a complex, intricate system of …

C-pack: Packed resources for general chinese embeddings

S **ao, Z Liu, P Zhang, N Muennighoff, D Lian… - Proceedings of the 47th …, 2024 - dl.acm.org
We introduce C-Pack, a package of resources that significantly advances the field of general
text embeddings for Chinese. C-Pack includes three critical resources. 1) C-MTP is a …

Augmenting large language models with chemistry tools

A M. Bran, S Cox, O Schilter, C Baldassari… - Nature Machine …, 2024 - nature.com
Large language models (LLMs) have shown strong performance in tasks across domains
but struggle with chemistry-related problems. These models also lack access to external …

Exploring the potential of large language models (llms) in learning on graphs

Z Chen, H Mao, H Li, W **, H Wen, X Wei… - ACM SIGKDD …, 2024 - dl.acm.org
Learning on Graphs has attracted immense attention due to its wide real-world applications.
The most popular pipeline for learning on graphs with textual node attributes primarily relies …

Improving text embeddings with large language models

L Wang, N Yang, X Huang, L Yang… - arxiv preprint arxiv …, 2023 - arxiv.org
In this paper, we introduce a novel and simple method for obtaining high-quality text
embeddings using only synthetic data and less than 1k training steps. Unlike existing …

Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation

J Chen, S **ao, P Zhang, K Luo, D Lian… - arxiv preprint arxiv …, 2024 - arxiv.org
In this paper, we present a new embedding model, called M3-Embedding, which is
distinguished for its versatility in Multi-Linguality, Multi-Functionality, and Multi-Granularity. It …

Large language models for information retrieval: A survey

Y Zhu, H Yuan, S Wang, J Liu, W Liu, C Deng… - arxiv preprint arxiv …, 2023 - arxiv.org
As a primary means of information acquisition, information retrieval (IR) systems, such as
search engines, have integrated themselves into our daily lives. These systems also serve …

Text embeddings by weakly-supervised contrastive pre-training

L Wang, N Yang, X Huang, B Jiao, L Yang… - arxiv preprint arxiv …, 2022 - arxiv.org
This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a
wide range of tasks. The model is trained in a contrastive manner with weak supervision …