- Academic Search

H Li, Q Dong, J Chen, H Su, Y Zhou, Q Ai, Z Ye… - arxiv preprint arxiv …, 2024 - arxiv.org

The rapid advancement of Large Language Models (LLMs) has driven their expanding
application across various fields. One of the most promising applications is their role as …

Simpan Kutip Dirujuk 2 kali Artikel terkait 2 versi Versi HTML

[Free GPT-4]

[PDF] arxiv.org

Reef: Representation encoding fingerprints for large language models

J Zhang, D Liu, C Qian, L Zhang, Y Liu, Y Qiao… - arxiv preprint arxiv …, 2024 - arxiv.org

Protecting the intellectual property of open-source Large Language Models (LLMs) is very
important, because training LLMs costs extensive computational resources and data …

Simpan Kutip Dirujuk 5 kali Artikel terkait 3 versi Versi HTML

[Free GPT-4]

[PDF] arxiv.org

Synthesizing post-training data for llms through multi-agent simulation

S Tang, X Pang, Z Liu, B Tang, R Ye, X Dong… - arxiv preprint arxiv …, 2024 - arxiv.org

Post-training is essential for enabling large language models (LLMs) to follow human
instructions. Inspired by the recent success of using LLMs to simulate human society, we …

Simpan Kutip Dirujuk 4 kali Artikel terkait 2 versi Versi HTML

[Free GPT-4]

[PDF] arxiv.org

Align anything: Training all-modality models to follow instructions with language feedback

J Ji, J Zhou, H Lou, B Chen, D Hong, X Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Reinforcement learning from human feedback (RLHF) has proven effective in enhancing the
instruction-following capabilities of large language models; however, it remains …

Simpan Kutip Dirujuk 2 kali Artikel terkait 2 versi Versi HTML

[Free GPT-4]

[PDF] arxiv.org

Vlsbench: Unveiling visual leakage in multimodal safety

X Hu, D Liu, H Li, X Huang, J Shao - arxiv preprint arxiv:2411.19939, 2024 - arxiv.org

Safety concerns of Multimodal large language models (MLLMs) have gradually become an
important problem in various applications. Surprisingly, previous works indicate a counter …

Simpan Kutip Dirujuk 2 kali Artikel terkait 2 versi Versi HTML

[Free GPT-4]

[PDF] arxiv.org

Position: Llm unlearning benchmarks are weak measures of progress

P Thaker, S Hu, N Kale, Y Maurya, ZS Wu… - arxiv preprint arxiv …, 2024 - arxiv.org

Unlearning methods have the potential to improve the privacy and safety of large language
models (LLMs) by removing sensitive or harmful information post hoc. The LLM unlearning …

Simpan Kutip Dirujuk 3 kali Artikel terkait 2 versi Versi HTML

[Free GPT-4]

[PDF] arxiv.org

Beyond Scalar Reward Model: Learning Generative Judge from Preference Data

Z Ye, X Li, Q Li, Q Ai, Y Zhou, W Shen, D Yan… - arxiv preprint arxiv …, 2024 - arxiv.org

Learning from preference feedback is a common practice for aligning large language
models~(LLMs) with human value. Conventionally, preference data is learned and encoded …

Simpan Kutip Dirujuk 3 kali Artikel terkait 3 versi Versi HTML

[Free GPT-4]

[PDF] arxiv.org

Course-correction: Safety alignment using synthetic preferences

R Xu, Y Cai, Z Zhou, R Gu, H Weng, Y Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

The risk of harmful content generated by large language models (LLMs) becomes a critical
concern. This paper presents a systematic study on assessing and improving LLMs' …

Simpan Kutip Dirujuk 3 kali Artikel terkait 4 versi Versi HTML

[Free GPT-4]

[PDF] ai-plans.com

[PDF][PDF] Aligner: Efficient alignment by learning to correct

J Ji, B Chen, H Lou, D Hong, B Zhang… - arxiv preprint arxiv …, 2024 - beta.ai-plans.com

With the rapid development of large language models (LLMs) and ever-evolving practical
requirements, finding an efficient and effective alignment method has never been more …

Simpan Kutip Dirujuk 3 kali Artikel terkait 3 versi Versi HTML

[Free GPT-4]

[PDF] arxiv.org

Targeted manipulation and deception emerge when optimizing llms for user feedback

M Williams, M Carroll, A Narang, C Weisser… - arxiv preprint arxiv …, 2024 - arxiv.org

As LLMs become more widely deployed, there is increasing interest in directly optimizing for
feedback from end users (eg thumbs up) in addition to feedback from paid annotators …

Simpan Kutip Dirujuk 2 kali Artikel terkait 2 versi Versi HTML

Kutip

Penelusuran lanjutan

Disimpan ke Koleksi saya

Llms-as-judges: a comprehensive survey on llm-based evaluation methods

Reef: Representation encoding fingerprints for large language models

Synthesizing post-training data for llms through multi-agent simulation

Align anything: Training all-modality models to follow instructions with language feedback

Vlsbench: Unveiling visual leakage in multimodal safety

Position: Llm unlearning benchmarks are weak measures of progress

Beyond Scalar Reward Model: Learning Generative Judge from Preference Data

Course-correction: Safety alignment using synthetic preferences

[PDF][PDF] Aligner: Efficient alignment by learning to correct

Targeted manipulation and deception emerge when optimizing llms for user feedback