From text to insight: large language models for materials science data extraction
The vast majority of materials science knowledge exists in unstructured natural language,
yet structured data is crucial for innovative and systematic materials design. Traditionally, the …
yet structured data is crucial for innovative and systematic materials design. Traditionally, the …
Manipulate-anything: Automating real-world robots using vision-language models
Large-scale endeavors like and widespread community efforts such as Open-X-Embodiment
have contributed to growing the scale of robot demonstration data. However, there is still an …
have contributed to growing the scale of robot demonstration data. However, there is still an …
Open problems in technical ai governance
AI progress is creating a growing range of risks and opportunities, but it is often unclear how
they should be navigated. In many cases, the barriers and uncertainties faced are at least …
they should be navigated. In many cases, the barriers and uncertainties faced are at least …
Audio-Language Datasets of Scenes and Events: A Survey
Audio-language models (ALMs) generate linguistic descriptions of sound-producing events
and scenes. Advances in dataset creation and computational power have led to significant …
and scenes. Advances in dataset creation and computational power have led to significant …
Advancing multimodal medical capabilities of Gemini
Many clinical tasks require an understanding of specialized data, such as medical images
and genomics, which is not typically found in general-purpose large multimodal models …
and genomics, which is not typically found in general-purpose large multimodal models …
Emergence of hidden capabilities: Exploring learning dynamics in concept space
Modern generative models demonstrate impressive capabilities, likely stemming from an
ability to identify and manipulate abstract concepts underlying their training data. However …
ability to identify and manipulate abstract concepts underlying their training data. However …
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
The rapid development of large language models (LLMs) has been witnessed in recent
years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from …
years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from …
Multimodal document analytics for banking process automation
Traditional banks are increasingly challenged by FinTechs, particularly in leveraging
advanced technologies to enhance operational efficiency. Our study addresses this by …
advanced technologies to enhance operational efficiency. Our study addresses this by …
On Erroneous Agreements of CLIP Image Embeddings
Recent research suggests that the failures of Vision-Language Models (VLMs) at visual
reasoning often stem from erroneous agreements--when semantically distinct images are …
reasoning often stem from erroneous agreements--when semantically distinct images are …
Context-aware multimodal pretraining
Large-scale multimodal representation learning successfully optimizes for zero-shot transfer
at test time. Yet the standard pretraining paradigm (contrastive learning on large amounts of …
at test time. Yet the standard pretraining paradigm (contrastive learning on large amounts of …