Table pre-training: A survey on model architectures, pre-training objectives, and downstream tasks
Since a vast number of tables can be easily collected from web pages, spreadsheets, PDFs,
and various other document types, a flurry of table pre-training frameworks have been …
and various other document types, a flurry of table pre-training frameworks have been …
Llava-onevision: Easy visual task transfer
We present LLaVA-OneVision, a family of open large multimodal models (LMMs) developed
by consolidating our insights into data, models, and visual representations in the LLaVA …
by consolidating our insights into data, models, and visual representations in the LLaVA …
Cambrian-1: A fully open, vision-centric exploration of multimodal llms
We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-
centric approach. While stronger language models can enhance multimodal capabilities, the …
centric approach. While stronger language models can enhance multimodal capabilities, the …
A survey of table reasoning with large language models
Table reasoning aims to generate inference results based on the user requirement and the
provided table. Enhancing the table reasoning capability of the model can aid in obtaining …
provided table. Enhancing the table reasoning capability of the model can aid in obtaining …
MultiHiertt: Numerical reasoning over multi hierarchical tabular and textual data
Numerical reasoning over hybrid data containing both textual and tabular content (eg,
financial reports) has recently attracted much attention in the NLP community. However …
financial reports) has recently attracted much attention in the NLP community. However …
Nvlm: Open frontier-class multimodal llms
We introduce NVLM 1.0, a family of frontier-class multimodal large language models (LLMs)
that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary …
that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary …
Mm1. 5: Methods, analysis & insights from multimodal llm fine-tuning
We present MM1. 5, a new family of multimodal large language models (MLLMs) designed
to enhance capabilities in text-rich image understanding, visual referring and grounding …
to enhance capabilities in text-rich image understanding, visual referring and grounding …
Tablellama: Towards open large generalist models for tables
Semi-structured tables are ubiquitous. There has been a variety of tasks that aim to
automatically interpret, augment, and query tables. Current methods often require …
automatically interpret, augment, and query tables. Current methods often require …
A survey of reasoning with foundation models
Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-
world settings such as negotiation, medical diagnosis, and criminal investigation. It serves …
world settings such as negotiation, medical diagnosis, and criminal investigation. It serves …
Large language models for tabular data: Progresses and future directions
Tables contain a significant portion of the world's structured information. The ability to
efficiently and accurately understand, process, reason about, analyze, and generate tabular …
efficiently and accurately understand, process, reason about, analyze, and generate tabular …