Matching structure for dual learning
Many natural language processing (NLP) tasks appear in dual forms, which are generally
solved by dual learning technique that models the dualities between the coupled tasks. In …
solved by dual learning technique that models the dualities between the coupled tasks. In …
Experience grounds language
Language understanding research is held back by a failure to relate language to the
physical world it describes and to the social interactions it facilitates. Despite the incredible …
physical world it describes and to the social interactions it facilitates. Despite the incredible …
Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis
Multimodal sentiment analysis aims to extract and integrate semantic information collected
from multiple modalities to recognize the expressed emotions and sentiment in multimodal …
from multiple modalities to recognize the expressed emotions and sentiment in multimodal …
Natural language to code translation with execution
Generative models of code, pretrained on large corpora of programs, have shown great
success in translating natural language to code (Chen et al., 2021; Austin et al., 2021; Li et …
success in translating natural language to code (Chen et al., 2021; Austin et al., 2021; Li et …
Vokenization: Improving language understanding with contextualized, visual-grounded supervision
Humans learn language by listening, speaking, writing, reading, and also, via interaction
with the multimodal real world. Existing language pre-training frameworks show the …
with the multimodal real world. Existing language pre-training frameworks show the …
Toward understanding the communication in sperm whales
Machine learning has been advancing dramatically over the past decade. Most strides are
human-based applications due to the availability of large-scale datasets; however …
human-based applications due to the availability of large-scale datasets; however …
Tree-augmented cross-modal encoding for complex-query video retrieval
The rapid growth of user-generated videos on the Internet has intensified the need for text-
based video retrieval systems. Traditional methods mainly favor the concept-based …
based video retrieval systems. Traditional methods mainly favor the concept-based …
Compound probabilistic context-free grammars for grammar induction
We study a formalization of the grammar induction problem that models sentences as being
generated by a compound probabilistic context-free grammar. In contrast to traditional …
generated by a compound probabilistic context-free grammar. In contrast to traditional …
HiCLIP: Contrastive language-image pretraining with hierarchy-aware attention
The success of large-scale contrastive vision-language pretraining (CLIP) has benefited
both visual recognition and multimodal content understanding. The concise design brings …
both visual recognition and multimodal content understanding. The concise design brings …
Are pre-trained language models aware of phrases? simple but strong baselines for grammar induction
With the recent success and popularity of pre-trained language models (LMs) in natural
language processing, there has been a rise in efforts to understand their inner workings. In …
language processing, there has been a rise in efforts to understand their inner workings. In …