Multimodal machine learning: A survey and taxonomy

T Baltrušaitis, C Ahuja… - IEEE transactions on …, 2018 - ieeexplore.ieee.org
Our experience of the world is multimodal-we see objects, hear sounds, feel texture, smell
odors, and taste flavors. Modality refers to the way in which something happens or is …

Semantic memory: A review of methods, models, and current challenges

AA Kumar - Psychonomic bulletin & review, 2021 - Springer
Adult semantic memory has been traditionally conceptualized as a relatively static memory
system that consists of knowledge about the world, concepts, and symbols. Considerable …

Zero-shot learning through cross-modal transfer

R Socher, M Ganjoo… - Advances in neural …, 2013 - proceedings.neurips.cc
This work introduces a model that can recognize objects in images even if no training data is
available for the object class. The only necessary knowledge about unseen categories …

Multi-modal machine learning in engineering design: A review and future directions

B Song, R Zhou, F Ahmed - … of Computing and …, 2024 - asmedigitalcollection.asme.org
In the rapidly advancing field of multi-modal machine learning (MMML), the convergence of
multiple data modalities has the potential to reshape various applications. This paper …

Distributional models of word meaning

A Lenci - Annual review of Linguistics, 2018 - annualreviews.org
Distributional semantics is a usage-based model of meaning, based on the assumption that
the statistical distribution of linguistic items in context plays a key role in characterizing their …

Multimodal distributional semantics

E Bruni, NK Tran, M Baroni - Journal of artificial intelligence research, 2014 - jair.org
Distributional semantic models derive computational representations of word meaning from
the patterns of co-occurrence of words in text. Such models have been a success story of …

Grounding action descriptions in videos

M Regneri, M Rohrbach, D Wetzel, S Thater… - Transactions of the …, 2013 - direct.mit.edu
Recent work has shown that the integration of visual information into text-based models can
substantially improve model predictions, but so far only visual information extracted from …

Combining language and vision with a multimodal skip-gram model

A Lazaridou, NT Pham, M Baroni - arxiv preprint arxiv:1501.02598, 2015 - arxiv.org
We extend the SKIP-GRAM model of Mikolov et al.(2013a) by taking visual information into
account. Like SKIP-GRAM, our multimodal models (MMSKIP-GRAM) build vector-based …

Frege in space: A program for compositional distributional semantics

M Baroni, R Bernardi, R Zamparelli - Linguistic Issues in language …, 2014 - iris.unitn.it
The lexicon of any natural language encodes a huge number of distinct word meanings. Just
to understand this article, you will need to know what thousands of words mean. The space …

[PDF][PDF] Distributional semantics in technicolor

E Bruni, G Boleda, M Baroni… - Proceedings of the 50th …, 2012 - aclanthology.org
Our research aims at building computational models of word meaning that are perceptually
grounded. Using computer vision techniques, we build visual and multimodal distributional …