Follow
Manish Nagireddy
Manish Nagireddy
IBM Research AI, MIT-IBM Watson AI Lab
Verified email at ibm.com - Homepage
Title
Cited by
Cited by
Year
Exploring How Machine Learning Practitioners (Try To) Use Fairness Toolkits
WH Deng, M Nagireddy, MSA Lee, J Singh, ZS Wu, K Holstein, H Zhu
2022 ACM Conference on Fairness, Accountability, and Transparency, 473-484, 2022
922022
Detectors for safe and reliable llms: Implementations, uses, and limitations
S Achintalwar, AA Garcia, A Anaby-Tavor, I Baldini, SE Berger, ...
arXiv preprint arXiv:2403.06009, 2024
142024
SocialStigmaQA: A Benchmark to Uncover Stigma Amplification in Generative Language Models
M Nagireddy, L Chiazor, M Singh, I Baldini
Proceedings of the 2024 AAAI Conference on Artificial Intelligence, 2023
142023
A sandbox tool to bias (Stress)-test fairness algorithms
NJ Akpinar, M Nagireddy, L Stapleton, HF Cheng, H Zhu, S Wu, H Heidari
EAAMO 2022 Poster, 2022
132022
The RealHumanEval: Evaluating Large Language Models' Abilities to Support Programmers
H Mozannar, V Chen, M Alsobay, S Das, S Zhao, D Wei, M Nagireddy, ...
arXiv preprint arXiv:2404.02806, 2024
9*2024
Language Models in Dialogue: Conversational Maxims for Human-AI Interactions
E Miehling, M Nagireddy, P Sattigeri, EM Daly, D Piorkowski, JT Richards
arXiv preprint arXiv:2403.15115, 2024
82024
Comvas: Contextual moral values alignment system
I Padhi, P Dognin, J Rios, R Luss, S Achintalwar, M Riemer, M Liu, ...
Proc. Int. Joint Conf. Artif. Intell, 8759-8762, 2024
42024
Multi-Level Explanations for Generative Language Models
LM Paes, D Wei, HJ Do, H Strobelt, R Luss, A Dhurandhar, M Nagireddy, ...
arXiv preprint arXiv:2403.14459, 2024
42024
Programming refusal with conditional activation steering
BW Lee, I Padhi, KN Ramamurthy, E Miehling, P Dognin, M Nagireddy, ...
arXiv preprint arXiv:2409.05907, 2024
32024
Alignment studio: Aligning large language models to particular contextual regulations
S Achintalwar, I Baldini, D Bouneffouf, J Byamugisha, M Chang, P Dognin, ...
IEEE Internet Computing, 2024
32024
Contextual Moral Value Alignment Through Context-Based Aggregation
P Dognin, J Rios, R Luss, I Padhi, MD Riemer, M Liu, P Sattigeri, ...
arXiv preprint arXiv:2403.12805, 2024
32024
DARE to Diversify: DAta Driven and Diverse LLM REd Teaming
M Nagireddy, B Guillén Pegueroles, I Baldini
Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and …, 2024
22024
Prompt Templates: A Methodology for Improving Manual Red Teaming Performance
B Dominique, D Piorkowski, M Nagireddy, I Baldini
ACM CHI Conference on Human Factors in Computing Systems, 2024
22024
Influence Based Approaches to Algorithmic Fairness: A Closer Look
S Ghosh, P Sattigeri, I Padhi, M Nagireddy, J Chen
NeurIPS 2023 Workshop on XAI in Action: Past, Present, and Future Applications, 2023
22023
Keeping Up with the Language Models: Systematic Benchmark Extension for Bias Auditing
I Baldini, C Yadav, M Nagireddy, P Das, KR Varshney
arXiv preprint arXiv:2305.12620, 2023
2*2023
Granite Guardian
I Padhi, M Nagireddy, G Cornacchia, S Chaudhury, T Pedapati, P Dognin, ...
arXiv preprint arXiv:2412.07724, 2024
12024
Value Alignment from Unstructured Text
I Padhi, KN Ramamurthy, P Sattigeri, M Nagireddy, P Dognin, ...
arXiv preprint arXiv:2408.10392, 2024
12024
When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails
M Nagireddy, I Padhi, S Ghosh, P Sattigeri
arXiv preprint arXiv:2407.06323, 2024
12024
Granite 3.0 Language Models
IBM Granite Team
12024
Function Composition in Trustworthy Machine Learning: Implementation Choices, Insights, and Questions
M Nagireddy, M Singh, SC Hoffman, E Ju, KN Ramamurthy, KR Varshney
arXiv preprint arXiv:2302.09190, 2023
12023
The system can't perform the operation now. Try again later.
Articles 1–20