Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

PP Liang, A Zadeh, LP Morency - arxiv preprint arxiv:2209.03430, 2022 - arxiv.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Searching for efficient transformers for language modeling

D So, W Mańke, H Liu, Z Dai… - Advances in neural …, 2021 - proceedings.neurips.cc
Large Transformer models have been central to recent advances in natural language
processing. The training and inference costs of these models, however, have grown rapidly …

Automl-zero: Evolving machine learning algorithms from scratch

E Real, C Liang, D So, Q Le - International conference on …, 2020 - proceedings.mlr.press
Abstract Machine learning research has advanced in multiple aspects, including model
structures and learning methods. The effort to automate such research, known as AutoML …

[HTML][HTML] Multibench: Multiscale benchmarks for multimodal representation learning

PP Liang, Y Lyu, X Fan, Z Wu, Y Cheng… - Advances in neural …, 2021 - ncbi.nlm.nih.gov
Learning multimodal representations involves integrating information from multiple
heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world …

Tensor methods in computer vision and deep learning

Y Panagakis, J Kossaifi, GG Chrysos… - Proceedings of the …, 2021 - ieeexplore.ieee.org
Tensors, or multidimensional arrays, are data structures that can naturally represent visual
data of multiple dimensions. Inherently able to efficiently capture structured, latent semantic …

Quantifying & modeling multimodal interactions: An information decomposition framework

PP Liang, Y Cheng, X Fan, CK Ling… - Advances in …, 2024 - proceedings.neurips.cc
The recent explosion of interest in multimodal applications has resulted in a wide selection
of datasets and methods for representing and integrating information from different …

Advancing RNN transducer technology for speech recognition

G Saon, Z Tüske, D Bolanos… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
We investigate a set of techniques for RNN Transducers (RNN-Ts) that were instrumental in
lowering the word error rate on three different tasks (Switchboard 300 hours, conversational …

Universal hopfield networks: A general framework for single-shot associative memory models

B Millidge, T Salvatori, Y Song… - International …, 2022 - proceedings.mlr.press
A large number of neural network models of associative memory have been proposed in the
literature. These include the classical Hopfield networks (HNs), sparse distributed memories …

Hit and lead discovery with explorative rl and fragment-based molecule generation

S Yang, D Hwang, S Lee, S Ryu… - Advances in Neural …, 2021 - proceedings.neurips.cc
Recently, utilizing reinforcement learning (RL) to generate molecules with desired properties
has been highlighted as a promising strategy for drug design. Molecular docking program--a …