- Academic Search

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

保存引用被引用次数：75 相关文章

[Free GPT-4]

[PDF] arxiv.org

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

PP Liang, A Zadeh, LP Morency - arxiv preprint arxiv:2209.03430, 2022 - arxiv.org

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

保存引用被引用次数：151 相关文章所有 2 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

Minicpm-v: A gpt-4v level mllm on your phone

Y Yao, T Yu, A Zhang, C Wang, J Cui, H Zhu… - arxiv preprint arxiv …, 2024 - arxiv.org

The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally
reshaped the landscape of AI research and industry, shedding light on a promising path …

保存引用被引用次数：180 相关文章所有 2 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

Mathverse: Does your multi-modal llm truly see the diagrams in visual math problems?

R Zhang, D Jiang, Y Zhang, H Lin, Z Guo, P Qiu… - … on Computer Vision, 2024 - Springer

The remarkable progress of Multi-modal Large Language Models (MLLMs) has gained
unparalleled attention. However, their capabilities in visual math problem-solving remain …

保存引用被引用次数：105 相关文章所有 2 个版本

[Free GPT-4]

[PDF] aclanthology.org

Theoremqa: A theorem-driven question answering dataset

W Chen, M Yin, M Ku, P Lu, Y Wan, X Ma… - Proceedings of the …, 2023 - aclanthology.org

The recent LLMs like GPT-4 and PaLM-2 have made tremendous progress in solving
fundamental math problems like GSM8K by achieving over 90% accuracy. However, their …

保存引用被引用次数：95 相关文章所有 8 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

A survey of deep learning for mathematical reasoning

P Lu, L Qiu, W Yu, S Welleck, KW Chang - arxiv preprint arxiv:2212.10535, 2022 - arxiv.org

Mathematical reasoning is a fundamental aspect of human intelligence and is applicable in
various fields, including science, engineering, finance, and everyday life. The development …

保存引用被引用次数：127 相关文章所有 6 个版本 HTML 版

RS5M and GeoRSCLIP: A large scale vision-language dataset and a large vision-language model for remote sensing

Z Zhang, T Zhao, Y Guo, J Yin - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Pretrained vision-language models (VLMs) utilizing extensive image–text paired data have
demonstrated unprecedented image–text association capabilities, achieving remarkable …

保存引用被引用次数：28 相关文章

[Free GPT-4]

[PDF] arxiv.org

Large language models for mathematical reasoning: Progresses and challenges

J Ahn, R Verma, R Lou, D Liu, R Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Mathematical reasoning serves as a cornerstone for assessing the fundamental cognitive
capabilities of human intelligence. In recent times, there has been a notable surge in the …

保存引用被引用次数：133 相关文章所有 4 个版本 HTML 版

[Free GPT-4]

[PDF] thecvf.com

Document understanding dataset and evaluation (dude)

J Van Landeghem, R Tito… - Proceedings of the …, 2023 - openaccess.thecvf.com

We call on the Document AI (DocAI) community to re-evaluate current methodologies and
embrace the challenge of creating more practically-oriented benchmarks. Document …

保存引用被引用次数：51 相关文章所有 9 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

Unigeo: Unifying geometry logical reasoning via reformulating mathematical expression

J Chen, T Li, J Qin, P Lu, L Lin, C Chen… - arxiv preprint arxiv …, 2022 - arxiv.org

Geometry problem solving is a well-recognized testbed for evaluating the high-level multi-
modal reasoning capability of deep models. In most existing works, two main geometry …

保存引用被引用次数：90 相关文章所有 4 个版本 HTML 版

创建快讯

引用

高级搜索

已保存到“我的图书馆”

GeoQA: A geometric question answering benchmark towards multimodal numerical reasoning

Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

Minicpm-v: A gpt-4v level mllm on your phone

Mathverse: Does your multi-modal llm truly see the diagrams in visual math problems?

Theoremqa: A theorem-driven question answering dataset

A survey of deep learning for mathematical reasoning

RS5M and GeoRSCLIP: A large scale vision-language dataset and a large vision-language model for remote sensing

Large language models for mathematical reasoning: Progresses and challenges

Document understanding dataset and evaluation (dude)

Unigeo: Unifying geometry logical reasoning via reformulating mathematical expression