- Academic Search

M Schilling-Wilhelmi, M Ríos-García, S Shabih… - arxiv preprint arxiv …, 2024 - arxiv.org

The vast majority of materials science knowledge exists in unstructured natural language,
yet structured data is crucial for innovative and systematic materials design. Traditionally, the …

Save Cite Cited by 10 Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

Manipulate-anything: Automating real-world robots using vision-language models

J Duan, W Yuan, W Pumacay, YR Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Large-scale endeavors like and widespread community efforts such as Open-X-Embodiment
have contributed to growing the scale of robot demonstration data. However, there is still an …

Save Cite Cited by 20 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Open problems in technical ai governance

A Reuel, B Bucknall, S Casper, T Fist, L Soder… - arxiv preprint arxiv …, 2024 - arxiv.org

AI progress is creating a growing range of risks and opportunities, but it is often unclear how
they should be navigated. In many cases, the barriers and uncertainties faced are at least …

Save Cite Cited by 22 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] ieee.org

Audio-Language Datasets of Scenes and Events: A Survey

G Wijngaard, E Formisano, M Esposito… - IEEE …, 2025 - ieeexplore.ieee.org

Audio-language models (ALMs) generate linguistic descriptions of sound-producing events
and scenes. Advances in dataset creation and computational power have led to significant …

Save Cite Cited by 2 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Advancing multimodal medical capabilities of Gemini

L Yang, S Xu, A Sellergren, T Kohlberger… - arxiv preprint arxiv …, 2024 - arxiv.org

Many clinical tasks require an understanding of specialized data, such as medical images
and genomics, which is not typically found in general-purpose large multimodal models …

Save Cite Cited by 26 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Emergence of hidden capabilities: Exploring learning dynamics in concept space

CF Park, M Okawa, A Lee, ES Lubana… - arxiv preprint arxiv …, 2024 - arxiv.org

Modern generative models demonstrate impressive capabilities, likely stemming from an
ability to identify and manipulate abstract concepts underlying their training data. However …

Save Cite Cited by 5 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective

Z Qin, D Chen, W Zhang, L Yao, Y Huang… - arxiv preprint arxiv …, 2024 - arxiv.org

The rapid development of large language models (LLMs) has been witnessed in recent
years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from …

Save Cite Cited by 5 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Multimodal document analytics for banking process automation

C Gerling, S Lessmann - Information Fusion, 2025 - Elsevier

Traditional banks are increasingly challenged by FinTechs, particularly in leveraging
advanced technologies to enhance operational efficiency. Our study addresses this by …

Save Cite Cited by 6 Related articles All 6 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

On Erroneous Agreements of CLIP Image Embeddings

S Li, PW Koh, SS Du - arxiv preprint arxiv:2411.05195, 2024 - arxiv.org

Recent research suggests that the failures of Vision-Language Models (VLMs) at visual
reasoning often stem from erroneous agreements--when semantically distinct images are …

Save Cite Cited by 2 Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

Context-aware multimodal pretraining

K Roth, Z Akata, D Damen, I Balažević… - arxiv preprint arxiv …, 2024 - arxiv.org

Large-scale multimodal representation learning successfully optimizes for zero-shot transfer
at test time. Yet the standard pretraining paradigm (contrastive learning on large amounts of …

Save Cite Cited by 1 Related articles View as HTML

Create alert

Cite

Advanced search

Saved to My library

No" zero-shot" without exponential data: Pretraining concept frequency determines multimodal...

From text to insight: large language models for materials science data extraction

Manipulate-anything: Automating real-world robots using vision-language models

Open problems in technical ai governance

Audio-Language Datasets of Scenes and Events: A Survey

Advancing multimodal medical capabilities of Gemini

Emergence of hidden capabilities: Exploring learning dynamics in concept space

The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective

Multimodal document analytics for banking process automation

On Erroneous Agreements of CLIP Image Embeddings

Context-aware multimodal pretraining