From text to insight: large language models for materials science data extraction

M Schilling-Wilhelmi, M Ríos-García, S Shabih… - arxiv preprint arxiv …, 2024 - arxiv.org
The vast majority of materials science knowledge exists in unstructured natural language,
yet structured data is crucial for innovative and systematic materials design. Traditionally, the …

Manipulate-anything: Automating real-world robots using vision-language models

J Duan, W Yuan, W Pumacay, YR Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Large-scale endeavors like and widespread community efforts such as Open-X-Embodiment
have contributed to growing the scale of robot demonstration data. However, there is still an …

Open problems in technical ai governance

A Reuel, B Bucknall, S Casper, T Fist, L Soder… - arxiv preprint arxiv …, 2024 - arxiv.org
AI progress is creating a growing range of risks and opportunities, but it is often unclear how
they should be navigated. In many cases, the barriers and uncertainties faced are at least …

Audio-Language Datasets of Scenes and Events: A Survey

G Wijngaard, E Formisano, M Esposito… - IEEE …, 2025 - ieeexplore.ieee.org
Audio-language models (ALMs) generate linguistic descriptions of sound-producing events
and scenes. Advances in dataset creation and computational power have led to significant …

Advancing multimodal medical capabilities of Gemini

L Yang, S Xu, A Sellergren, T Kohlberger… - arxiv preprint arxiv …, 2024 - arxiv.org
Many clinical tasks require an understanding of specialized data, such as medical images
and genomics, which is not typically found in general-purpose large multimodal models …

Emergence of hidden capabilities: Exploring learning dynamics in concept space

CF Park, M Okawa, A Lee, ES Lubana… - arxiv preprint arxiv …, 2024 - arxiv.org
Modern generative models demonstrate impressive capabilities, likely stemming from an
ability to identify and manipulate abstract concepts underlying their training data. However …

The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective

Z Qin, D Chen, W Zhang, L Yao, Y Huang… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid development of large language models (LLMs) has been witnessed in recent
years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from …

Multimodal document analytics for banking process automation

C Gerling, S Lessmann - Information Fusion, 2025 - Elsevier
Traditional banks are increasingly challenged by FinTechs, particularly in leveraging
advanced technologies to enhance operational efficiency. Our study addresses this by …

On Erroneous Agreements of CLIP Image Embeddings

S Li, PW Koh, SS Du - arxiv preprint arxiv:2411.05195, 2024 - arxiv.org
Recent research suggests that the failures of Vision-Language Models (VLMs) at visual
reasoning often stem from erroneous agreements--when semantically distinct images are …

Context-aware multimodal pretraining

K Roth, Z Akata, D Damen, I Balažević… - arxiv preprint arxiv …, 2024 - arxiv.org
Large-scale multimodal representation learning successfully optimizes for zero-shot transfer
at test time. Yet the standard pretraining paradigm (contrastive learning on large amounts of …