Foundations & trends in multimodal machine learning: Principles, challenges, and open questions
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
Searching for efficient transformers for language modeling
Large Transformer models have been central to recent advances in natural language
processing. The training and inference costs of these models, however, have grown rapidly …
processing. The training and inference costs of these models, however, have grown rapidly …
Automl-zero: Evolving machine learning algorithms from scratch
Abstract Machine learning research has advanced in multiple aspects, including model
structures and learning methods. The effort to automate such research, known as AutoML …
structures and learning methods. The effort to automate such research, known as AutoML …
[HTML][HTML] Multibench: Multiscale benchmarks for multimodal representation learning
Learning multimodal representations involves integrating information from multiple
heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world …
heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world …
Tensor methods in computer vision and deep learning
Tensors, or multidimensional arrays, are data structures that can naturally represent visual
data of multiple dimensions. Inherently able to efficiently capture structured, latent semantic …
data of multiple dimensions. Inherently able to efficiently capture structured, latent semantic …
Quantifying & modeling multimodal interactions: An information decomposition framework
The recent explosion of interest in multimodal applications has resulted in a wide selection
of datasets and methods for representing and integrating information from different …
of datasets and methods for representing and integrating information from different …
Advancing RNN transducer technology for speech recognition
We investigate a set of techniques for RNN Transducers (RNN-Ts) that were instrumental in
lowering the word error rate on three different tasks (Switchboard 300 hours, conversational …
lowering the word error rate on three different tasks (Switchboard 300 hours, conversational …
Universal hopfield networks: A general framework for single-shot associative memory models
A large number of neural network models of associative memory have been proposed in the
literature. These include the classical Hopfield networks (HNs), sparse distributed memories …
literature. These include the classical Hopfield networks (HNs), sparse distributed memories …
Hit and lead discovery with explorative rl and fragment-based molecule generation
Recently, utilizing reinforcement learning (RL) to generate molecules with desired properties
has been highlighted as a promising strategy for drug design. Molecular docking program--a …
has been highlighted as a promising strategy for drug design. Molecular docking program--a …