Foregrounding artist opinions: A survey study on transparency, ownership, and fairness in AI generative art

J Lovato, JW Zimmerman, I Smith, P Dodds… - Proceedings of the …, 2024 - ojs.aaai.org
Generative AI tools are used to create art-like outputs and sometimes aid in the creative
process. These tools have potential benefits for artists, but they also have the potential to …

On the self-verification limitations of large language models on reasoning and planning tasks

K Stechly, K Valmeekam, S Kambhampati - arxiv preprint arxiv …, 2024 - arxiv.org
There has been considerable divergence of opinion on the reasoning abilities of Large
Language Models (LLMs). While the initial optimism that reasoning might emerge …

Eureka: Evaluating and understanding large foundation models

V Balachandran, J Chen, N Joshi, B Nushi… - arxiv preprint arxiv …, 2024 - arxiv.org
Rigorous and reproducible evaluation is critical for assessing the state of the art and for
guiding scientific advances in Artificial Intelligence. Evaluation is challenging in practice due …

“I Want It That Way”: Enabling Interactive Decision Support Using Large Language Models and Constraint Programming

C Lawless, J Schoeffer, L Le, K Rowan, S Sen… - ACM Transactions on …, 2024 - dl.acm.org
A critical factor in the success of many decision support systems is the accurate modeling of
user preferences. Psychology research has demonstrated that users often develop their …

Chain of thoughtlessness: An analysis of cot in planning

K Stechly, K Valmeekam, S Kambhampati - arxiv preprint arxiv …, 2024 - arxiv.org
Large language model (LLM) performance on reasoning problems typically does not
generalize out of distribution. Previous work has claimed that this can be mitigated by …

BENCHAGENTS: Automated Benchmark Creation with Agent Interaction

N Butt, V Chandrasekaran, N Joshi, B Nushi… - arxiv preprint arxiv …, 2024 - arxiv.org
Evaluations are limited by benchmark availability. As models evolve, there is a need to
create benchmarks that can measure progress on new generative capabilities. However …

Recursive Decomposition of Logical Thoughts: Framework for Superior Reasoning and Knowledge Propagation in Large Language Models

KU Qasim, J Zhang, T Alsahfi, AUR Butt - arxiv preprint arxiv:2501.02026, 2025 - arxiv.org
Enhancing the reasoning capabilities of Large Language Models remains a critical
challenge in artificial intelligence. We introduce RDoLT, Recursive Decomposition of Logical …

The Ability of Large Language Models to Evaluate Constraint-satisfaction in Agent Responses to Open-ended Requests

L Madmoni, A Zait, I Labzovsky, D Karmon - arxiv preprint arxiv …, 2024 - arxiv.org
Generative AI agents are often expected to respond to complex user requests that have No
One Right Answer (NORA), eg," design a vegetarian meal plan below 1800 calories". Such …

From Instructions to Constraints: Language Model Alignment with Automatic Constraint Verification

F Wang, C Shang, S Jain, S Wang, Q Ning… - arxiv preprint arxiv …, 2024 - arxiv.org
User alignment is crucial for adapting general-purpose language models (LMs) to
downstream tasks, but human annotations are often not available for all types of instructions …

Towards Trustworthy Machine Learning: An Integer Programming Approach

CA Lawless - 2024 - search.proquest.com
Despite the proliferation of machine learning (ML) in a multitude of applications, current
black-box models, such as deep learning, remain hard to understand, critique, and judge by …