- Academic Search

Understanding practices, challenges, and opportunities for user-engaged algorithm auditing in industry practice

WH Deng, B Guo, A Devrio, H Shen, M Eslami… - Proceedings of the …, 2023 - dl.acm.org

Recent years have seen growing interest among both researchers and practitioners in user-
engaged approaches to algorithm auditing, which directly engage users in detecting …

Gem Citer Citeret af 51 Relaterede artikler Alle 3 versioner

[Free GPT-4]
[DeepSeek]

[PDF] pubpub.org

[PDF][PDF] Ai transparency in the age of llms: A human-centered research roadmap

QV Liao, JW Vaughan - arxiv preprint arxiv:2306.01941, 2023 - assets.pubpub.org

The rise of powerful large language models (LLMs) brings about tremendous opportunities
for innovation but also looming risks for individuals and society at large. We have reached a …

Gem Citer Citeret af 140 Relaterede artikler Alle 6 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Increasing diversity while maintaining accuracy: Text data generation with large language models and human interventions

JJY Chung, E Kamar, S Amershi - arxiv preprint arxiv:2306.04140, 2023 - arxiv.org

Large language models (LLMs) can be used to generate text data for training and evaluating
other models. However, creating high-quality datasets with LLMs can be challenging. In this …

Gem Citer Citeret af 118 Relaterede artikler Alle 4 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] mdpi.com

Hierarchical text classification and its foundations: A review of current research

A Zangari, M Marcuzzo, M Rizzo, L Giudice, A Albarelli… - Electronics, 2024 - mdpi.com

While collections of documents are often annotated with hierarchically structured concepts,
the benefits of these structures are rarely taken into account by classification techniques …

Gem Citer Citeret af 7 Relaterede artikler Alle 4 versioner Cached

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Toward trustworthy AI development: mechanisms for supporting verifiable claims

M Brundage, S Avin, J Wang, H Belfield… - arxiv preprint arxiv …, 2020 - arxiv.org

With the recent wave of progress in artificial intelligence (AI) has come a growing awareness
of the large-scale impacts of AI systems, and recognition that existing regulations and norms …

Gem Citer Citeret af 426 Relaterede artikler Alle 16 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Supporting human-ai collaboration in auditing llms with llms

C Rastogi, M Tulio Ribeiro, N King, H Nori… - Proceedings of the 2023 …, 2023 - dl.acm.org

Large language models (LLMs) are increasingly becoming all-powerful and pervasive via
deployment in sociotechnical systems. Yet these language models, be it for classification or …

Gem Citer Citeret af 74 Relaterede artikler Alle 4 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Evaluating models' local decision boundaries via contrast sets

M Gardner, Y Artzi, V Basmova, J Berant… - arxiv preprint arxiv …, 2020 - arxiv.org

Standard test sets for supervised learning evaluate in-distribution generalization.
Unfortunately, when a dataset has systematic gaps (eg, annotation artifacts), these …

Gem Citer Citeret af 485 Relaterede artikler Alle 4 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Evallm: Interactive evaluation of large language model prompts on user-defined criteria

TS Kim, Y Lee, J Shin, YH Kim, J Kim - … of the 2024 CHI Conference on …, 2024 - dl.acm.org

By simply composing prompts, developers can prototype novel generative applications with
Large Language Models (LLMs). To refine prototypes into products, however, developers …

Gem Citer Citeret af 51 Relaterede artikler Alle 10 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

HateCheck: Functional tests for hate speech detection models

P Röttger, B Vidgen, D Nguyen, Z Waseem… - arxiv preprint arxiv …, 2020 - arxiv.org

Detecting online hate is a difficult task that even state-of-the-art models struggle with.
Typically, hate speech detection models are evaluated by measuring their performance on …

Gem Citer Citeret af 271 Relaterede artikler Alle 8 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Polyjuice: Generating counterfactuals for explaining, evaluating, and improving models

T Wu, MT Ribeiro, J Heer, DS Weld - arxiv preprint arxiv:2101.00288, 2021 - arxiv.org

While counterfactual examples are useful for analysis and training of NLP models, current
generation methods either rely on manual labor to create very few counterfactuals, or only …

Gem Citer Citeret af 263 Relaterede artikler Alle 12 versioner Vis som HTML

Opret underretning

Citer

Avanceret søgning

Gemt i Min samling

Errudite: Scalable, reproducible, and testable error analysis

Understanding practices, challenges, and opportunities for user-engaged algorithm auditing in industry practice

[PDF][PDF] Ai transparency in the age of llms: A human-centered research roadmap

Increasing diversity while maintaining accuracy: Text data generation with large language models and human interventions

Hierarchical text classification and its foundations: A review of current research

Toward trustworthy AI development: mechanisms for supporting verifiable claims

Supporting human-ai collaboration in auditing llms with llms

Evaluating models' local decision boundaries via contrast sets

Evallm: Interactive evaluation of large language model prompts on user-defined criteria

HateCheck: Functional tests for hate speech detection models

Polyjuice: Generating counterfactuals for explaining, evaluating, and improving models