TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action

Z Ma, J Zhang, Z Liu, J Zhang, J Tan, M Shu… - arxiv preprint arxiv …, 2024 - arxiv.org
While open-source multi-modal language models perform well on simple question
answering tasks, they often fail on complex questions that require multiple capabilities, such …