TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action
While open-source multi-modal language models perform well on simple question
answering tasks, they often fail on complex questions that require multiple capabilities, such …
answering tasks, they often fail on complex questions that require multiple capabilities, such …