Exploring How Machine Learning Practitioners (Try To) Use Fairness Toolkits WH Deng, M Nagireddy, MSA Lee, J Singh, ZS Wu, K Holstein, H Zhu 2022 ACM Conference on Fairness, Accountability, and Transparency, 473-484, 2022 | 92 | 2022 |
Detectors for safe and reliable llms: Implementations, uses, and limitations S Achintalwar, AA Garcia, A Anaby-Tavor, I Baldini, SE Berger, ... arXiv preprint arXiv:2403.06009, 2024 | 14 | 2024 |
SocialStigmaQA: A Benchmark to Uncover Stigma Amplification in Generative Language Models M Nagireddy, L Chiazor, M Singh, I Baldini Proceedings of the 2024 AAAI Conference on Artificial Intelligence, 2023 | 14 | 2023 |
A sandbox tool to bias (Stress)-test fairness algorithms NJ Akpinar, M Nagireddy, L Stapleton, HF Cheng, H Zhu, S Wu, H Heidari EAAMO 2022 Poster, 2022 | 13 | 2022 |
The RealHumanEval: Evaluating Large Language Models' Abilities to Support Programmers H Mozannar, V Chen, M Alsobay, S Das, S Zhao, D Wei, M Nagireddy, ... arXiv preprint arXiv:2404.02806, 2024 | 9* | 2024 |
Language Models in Dialogue: Conversational Maxims for Human-AI Interactions E Miehling, M Nagireddy, P Sattigeri, EM Daly, D Piorkowski, JT Richards arXiv preprint arXiv:2403.15115, 2024 | 8 | 2024 |
Comvas: Contextual moral values alignment system I Padhi, P Dognin, J Rios, R Luss, S Achintalwar, M Riemer, M Liu, ... Proc. Int. Joint Conf. Artif. Intell, 8759-8762, 2024 | 4 | 2024 |
Multi-Level Explanations for Generative Language Models LM Paes, D Wei, HJ Do, H Strobelt, R Luss, A Dhurandhar, M Nagireddy, ... arXiv preprint arXiv:2403.14459, 2024 | 4 | 2024 |
Programming refusal with conditional activation steering BW Lee, I Padhi, KN Ramamurthy, E Miehling, P Dognin, M Nagireddy, ... arXiv preprint arXiv:2409.05907, 2024 | 3 | 2024 |
Alignment studio: Aligning large language models to particular contextual regulations S Achintalwar, I Baldini, D Bouneffouf, J Byamugisha, M Chang, P Dognin, ... IEEE Internet Computing, 2024 | 3 | 2024 |
Contextual Moral Value Alignment Through Context-Based Aggregation P Dognin, J Rios, R Luss, I Padhi, MD Riemer, M Liu, P Sattigeri, ... arXiv preprint arXiv:2403.12805, 2024 | 3 | 2024 |
DARE to Diversify: DAta Driven and Diverse LLM REd Teaming M Nagireddy, B Guillén Pegueroles, I Baldini Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and …, 2024 | 2 | 2024 |
Prompt Templates: A Methodology for Improving Manual Red Teaming Performance B Dominique, D Piorkowski, M Nagireddy, I Baldini ACM CHI Conference on Human Factors in Computing Systems, 2024 | 2 | 2024 |
Influence Based Approaches to Algorithmic Fairness: A Closer Look S Ghosh, P Sattigeri, I Padhi, M Nagireddy, J Chen NeurIPS 2023 Workshop on XAI in Action: Past, Present, and Future Applications, 2023 | 2 | 2023 |
Keeping Up with the Language Models: Systematic Benchmark Extension for Bias Auditing I Baldini, C Yadav, M Nagireddy, P Das, KR Varshney arXiv preprint arXiv:2305.12620, 2023 | 2* | 2023 |
Granite Guardian I Padhi, M Nagireddy, G Cornacchia, S Chaudhury, T Pedapati, P Dognin, ... arXiv preprint arXiv:2412.07724, 2024 | 1 | 2024 |
Value Alignment from Unstructured Text I Padhi, KN Ramamurthy, P Sattigeri, M Nagireddy, P Dognin, ... arXiv preprint arXiv:2408.10392, 2024 | 1 | 2024 |
When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails M Nagireddy, I Padhi, S Ghosh, P Sattigeri arXiv preprint arXiv:2407.06323, 2024 | 1 | 2024 |
Granite 3.0 Language Models IBM Granite Team | 1 | 2024 |
Function Composition in Trustworthy Machine Learning: Implementation Choices, Insights, and Questions M Nagireddy, M Singh, SC Hoffman, E Ju, KN Ramamurthy, KR Varshney arXiv preprint arXiv:2302.09190, 2023 | 1 | 2023 |