Mist: Multi-modal iterative spatial-temporal transformer for long-form video question answering

D Gao, L Zhou, L Ji, L Zhu, Y Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract To build Video Question Answering (VideoQA) systems capable of assisting
humans in daily activities, seeking answers from long-form videos with diverse and complex …

Unsupervised learning for combinatorial optimization with principled objective relaxation

HP Wang, N Wu, H Yang, C Hao… - Advances in Neural …, 2022 - proceedings.neurips.cc
Using machine learning to solve combinatorial optimization (CO) problems is challenging,
especially when the data is unlabeled. This work proposes an unsupervised learning …

GeoPhy: differentiable phylogenetic inference via geometric gradients of tree topologies

T Mimori, M Hamada - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Phylogenetic inference, grounded in molecular evolution models, is essential for
understanding the evolutionary relationships in biological data. Accounting for the …

Differentiable clustering with perturbed spanning forests

L Stewart, F Bach, F Llinares-López… - Advances in Neural …, 2024 - proceedings.neurips.cc
We introduce a differentiable clustering method based on stochastic perturbations of
minimum-weight spanning forests. This allows us to include clustering in end-to-end …

A unified perspective on regularization and perturbation in differentiable subset selection

X Sun, CH Leung, Y Li, Q Wu - International Conference on …, 2023 - proceedings.mlr.press
Subset selection, ie, finding a bunch of items from a collection to achieve specific goals, has
wide applications in information retrieval, statistics, and machine learning. To implement an …

Noise-Resilient Symbolic Regression with Dynamic Gating Reinforcement Learning

C Sun, S Shen, W Tao, D Xue, Z Zhou - arxiv preprint arxiv:2501.01085, 2025 - arxiv.org
Symbolic regression (SR) has emerged as a pivotal technique for uncovering the intrinsic
information within data and enhancing the interpretability of AI models. However, current …

[PDF][PDF] MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering-Supplementary Material

D Gao, L Zhou, L Ji, L Zhu, Y Yang, MZ Shou - openaccess.thecvf.com
As illustrated in the main paper, MIST calculates multimodal attention between
segment/patch features and question features, then performs top-k hard selection over …

[PDF][PDF] SKIPPOOL: Improved Sparse Hierarchical Graph Pooling with Differentiable Exploration

S Imaduwage - researchgate.net
Multiple techniques have been proposed to extract multi-resolution representations (MRR)
from graphs in Graph Representation Learning (GRL). Graph Neural Networks (GNN) …

Differentiable Clustering and Partial Fenchel-Young Losses

L Stewart, F Bach, F Llinares-López… - ICML 2023 Workshop on … - openreview.net
We introduce a differentiable clustering method based on stochastic perturbations of
minimum-weight spanning forests. This allows us to include clustering in end-to-end …

Learning Arborescence with An Efficient Inference Algorithm

N Jiang, MJ Jacobson, Y Xue - openreview.net
We consider a class of structured learning problems on arborescence (ie, the directed
spanning tree) from the input graph. The key step involved in this problem is predicting the …