Matching structure for dual learning

H Fei, S Wu, Y Ren, M Zhang - international conference on …, 2022 - proceedings.mlr.press
Many natural language processing (NLP) tasks appear in dual forms, which are generally
solved by dual learning technique that models the dualities between the coupled tasks. In …

Experience grounds language

Y Bisk, A Holtzman, J Thomason, J Andreas… - arxiv preprint arxiv …, 2020 - arxiv.org
Language understanding research is held back by a failure to relate language to the
physical world it describes and to the social interactions it facilitates. Despite the incredible …

Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis

W Han, H Chen, A Gelbukh, A Zadeh… - Proceedings of the …, 2021 - dl.acm.org
Multimodal sentiment analysis aims to extract and integrate semantic information collected
from multiple modalities to recognize the expressed emotions and sentiment in multimodal …

Natural language to code translation with execution

F Shi, D Fried, M Ghazvininejad, L Zettlemoyer… - arxiv preprint arxiv …, 2022 - arxiv.org
Generative models of code, pretrained on large corpora of programs, have shown great
success in translating natural language to code (Chen et al., 2021; Austin et al., 2021; Li et …

Vokenization: Improving language understanding with contextualized, visual-grounded supervision

H Tan, M Bansal - arxiv preprint arxiv:2010.06775, 2020 - arxiv.org
Humans learn language by listening, speaking, writing, reading, and also, via interaction
with the multimodal real world. Existing language pre-training frameworks show the …

Toward understanding the communication in sperm whales

J Andreas, G Beguš, MM Bronstein, R Diamant… - Iscience, 2022 - cell.com
Machine learning has been advancing dramatically over the past decade. Most strides are
human-based applications due to the availability of large-scale datasets; however …

Tree-augmented cross-modal encoding for complex-query video retrieval

X Yang, J Dong, Y Cao, X Wang, M Wang… - Proceedings of the 43rd …, 2020 - dl.acm.org
The rapid growth of user-generated videos on the Internet has intensified the need for text-
based video retrieval systems. Traditional methods mainly favor the concept-based …

Compound probabilistic context-free grammars for grammar induction

Y Kim, C Dyer, AM Rush - arxiv preprint arxiv:1906.10225, 2019 - arxiv.org
We study a formalization of the grammar induction problem that models sentences as being
generated by a compound probabilistic context-free grammar. In contrast to traditional …

HiCLIP: Contrastive language-image pretraining with hierarchy-aware attention

S Geng, J Yuan, Y Tian, Y Chen, Y Zhang - arxiv preprint arxiv …, 2023 - arxiv.org
The success of large-scale contrastive vision-language pretraining (CLIP) has benefited
both visual recognition and multimodal content understanding. The concise design brings …

Are pre-trained language models aware of phrases? simple but strong baselines for grammar induction

T Kim, J Choi, D Edmiston, S Lee - arxiv preprint arxiv:2002.00737, 2020 - arxiv.org
With the recent success and popularity of pre-trained language models (LMs) in natural
language processing, there has been a rise in efforts to understand their inner workings. In …