Table pre-training: A survey on model architectures, pre-training objectives, and downstream tasks

H Dong, Z Cheng, X He, M Zhou, A Zhou… - arxiv preprint arxiv …, 2022 - arxiv.org
Since a vast number of tables can be easily collected from web pages, spreadsheets, PDFs,
and various other document types, a flurry of table pre-training frameworks have been …

Llava-onevision: Easy visual task transfer

B Li, Y Zhang, D Guo, R Zhang, F Li, H Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
We present LLaVA-OneVision, a family of open large multimodal models (LMMs) developed
by consolidating our insights into data, models, and visual representations in the LLaVA …

Cambrian-1: A fully open, vision-centric exploration of multimodal llms

S Tong, E Brown, P Wu, S Woo, M Middepogu… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-
centric approach. While stronger language models can enhance multimodal capabilities, the …

A survey of table reasoning with large language models

X Zhang, D Wang, L Dou, Q Zhu, W Che - Frontiers of Computer Science, 2025 - Springer
Table reasoning aims to generate inference results based on the user requirement and the
provided table. Enhancing the table reasoning capability of the model can aid in obtaining …

MultiHiertt: Numerical reasoning over multi hierarchical tabular and textual data

Y Zhao, Y Li, C Li, R Zhang - arxiv preprint arxiv:2206.01347, 2022 - arxiv.org
Numerical reasoning over hybrid data containing both textual and tabular content (eg,
financial reports) has recently attracted much attention in the NLP community. However …

Nvlm: Open frontier-class multimodal llms

W Dai, N Lee, B Wang, Z Yang, Z Liu, J Barker… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce NVLM 1.0, a family of frontier-class multimodal large language models (LLMs)
that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary …

Mm1. 5: Methods, analysis & insights from multimodal llm fine-tuning

H Zhang, M Gao, Z Gan, P Dufter, N Wenzel… - arxiv preprint arxiv …, 2024 - arxiv.org
We present MM1. 5, a new family of multimodal large language models (MLLMs) designed
to enhance capabilities in text-rich image understanding, visual referring and grounding …

Tablellama: Towards open large generalist models for tables

T Zhang, X Yue, Y Li, H Sun - arxiv preprint arxiv:2311.09206, 2023 - arxiv.org
Semi-structured tables are ubiquitous. There has been a variety of tasks that aim to
automatically interpret, augment, and query tables. Current methods often require …

A survey of reasoning with foundation models

J Sun, C Zheng, E **e, Z Liu, R Chu, J Qiu, J Xu… - arxiv preprint arxiv …, 2023 - arxiv.org
Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-
world settings such as negotiation, medical diagnosis, and criminal investigation. It serves …

Large language models for tabular data: Progresses and future directions

H Dong, Z Wang - Proceedings of the 47th International ACM SIGIR …, 2024 - dl.acm.org
Tables contain a significant portion of the world's structured information. The ability to
efficiently and accurately understand, process, reason about, analyze, and generate tabular …