- Academic Search

Z **, W Chen, X Guo, W He, Y Ding, B Hong… - Science China …, 2025 - Springer

For a long time, researchers have sought artificial intelligence (AI) that matches or exceeds
human intelligence. AI agents, which are artificial entities capable of sensing the …

Zapisz Cytuj Cytowane przez 756 Powiązane artykuły Wszystkie wersje 6

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Challenges and applications of large language models

J Kaddour, J Harris, M Mozes, H Bradley… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

Zapisz Cytuj Cytowane przez 487 Powiązane artykuły Wszystkie wersje 4 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Are aligned neural networks adversarially aligned?

N Carlini, M Nasr… - Advances in …, 2023 - proceedings.neurips.cc

Large language models are now tuned to align with the goals of their creators, namely to be"
helpful and harmless." These models should respond helpfully to user questions, but refuse …

Zapisz Cytuj Cytowane przez 273 Powiązane artykuły Wszystkie wersje 8 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Regulating ChatGPT and other large generative AI models

P Hacker, A Engel, M Mauer - Proceedings of the 2023 ACM conference …, 2023 - dl.acm.org

Large generative AI models (LGAIMs), such as ChatGPT, GPT-4 or Stable Diffusion, are
rapidly transforming the way we communicate, illustrate, and create. However, AI regulation …

Zapisz Cytuj Cytowane przez 457 Powiązane artykuły Wszystkie wersje 3

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Towards automated circuit discovery for mechanistic interpretability

A Conmy, A Mavor-Parker, A Lynch… - Advances in …, 2023 - proceedings.neurips.cc

Through considerable effort and intuition, several recent works have reverse-engineered
nontrivial behaviors oftransformer models. This paper systematizes the mechanistic …

Zapisz Cytuj Cytowane przez 231 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] academia.edu

Artificial Intelligence Trust, risk and security management (AI trism): Frameworks, applications, challenges and future research directions

A Habbal, MK Ali, MA Abuzaraida - Expert Systems with Applications, 2024 - Elsevier

Artificial Intelligence (AI) has become pervasive, enabling transformative advancements in
various industries including smart city, smart healthcare, smart manufacturing, smart virtual …

Zapisz Cytuj Cytowane przez 211 Powiązane artykuły Wszystkie wersje 3

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

The stable signature: Rooting watermarks in latent diffusion models

P Fernandez, G Couairon, H Jégou… - Proceedings of the …, 2023 - openaccess.thecvf.com

Generative image modeling enables a wide range of applications but raises ethical
concerns about responsible deployment. This paper introduces an active strategy combining …

Zapisz Cytuj Cytowane przez 200 Powiązane artykuły Wszystkie wersje 10 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

Ethical principles for artificial intelligence in education

A Nguyen, HN Ngo, Y Hong, B Dang… - Education and …, 2023 - Springer

The advancement of artificial intelligence in education (AIED) has the potential to transform
the educational landscape and influence the role of all involved stakeholders. In recent …

Zapisz Cytuj Cytowane przez 703 Powiązane artykuły Wszystkie wersje 13

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Harmbench: A standardized evaluation framework for automated red teaming and robust refusal

M Mazeika, L Phan, X Yin, A Zou, Z Wang, N Mu… - arxiv preprint arxiv …, 2024 - arxiv.org

Automated red teaming holds substantial promise for uncovering and mitigating the risks
associated with the malicious use of large language models (LLMs), yet the field lacks a …

Zapisz Cytuj Cytowane przez 201 Powiązane artykuły Wszystkie wersje 8 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arxiv preprint arxiv …, 2024 - arxiv.org

This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

Zapisz Cytuj Cytowane przez 134 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

The malicious use of artificial intelligence: Forecasting, prevention, and mitigation

The rise and potential of large language model based agents: A survey

Challenges and applications of large language models

Are aligned neural networks adversarially aligned?

Regulating ChatGPT and other large generative AI models

Towards automated circuit discovery for mechanistic interpretability

Artificial Intelligence Trust, risk and security management (AI trism): Frameworks, applications, challenges and future research directions

The stable signature: Rooting watermarks in latent diffusion models

Ethical principles for artificial intelligence in education

Harmbench: A standardized evaluation framework for automated red teaming and robust refusal

Foundational challenges in assuring alignment and safety of large language models