Mist: Multi-modal iterative spatial-temporal transformer for long-form video question answering
Abstract To build Video Question Answering (VideoQA) systems capable of assisting
humans in daily activities, seeking answers from long-form videos with diverse and complex …
humans in daily activities, seeking answers from long-form videos with diverse and complex …
Unsupervised learning for combinatorial optimization with principled objective relaxation
Using machine learning to solve combinatorial optimization (CO) problems is challenging,
especially when the data is unlabeled. This work proposes an unsupervised learning …
especially when the data is unlabeled. This work proposes an unsupervised learning …
GeoPhy: differentiable phylogenetic inference via geometric gradients of tree topologies
Phylogenetic inference, grounded in molecular evolution models, is essential for
understanding the evolutionary relationships in biological data. Accounting for the …
understanding the evolutionary relationships in biological data. Accounting for the …
Differentiable clustering with perturbed spanning forests
We introduce a differentiable clustering method based on stochastic perturbations of
minimum-weight spanning forests. This allows us to include clustering in end-to-end …
minimum-weight spanning forests. This allows us to include clustering in end-to-end …
A unified perspective on regularization and perturbation in differentiable subset selection
Subset selection, ie, finding a bunch of items from a collection to achieve specific goals, has
wide applications in information retrieval, statistics, and machine learning. To implement an …
wide applications in information retrieval, statistics, and machine learning. To implement an …
Noise-Resilient Symbolic Regression with Dynamic Gating Reinforcement Learning
Symbolic regression (SR) has emerged as a pivotal technique for uncovering the intrinsic
information within data and enhancing the interpretability of AI models. However, current …
information within data and enhancing the interpretability of AI models. However, current …
[PDF][PDF] MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering-Supplementary Material
As illustrated in the main paper, MIST calculates multimodal attention between
segment/patch features and question features, then performs top-k hard selection over …
segment/patch features and question features, then performs top-k hard selection over …
[PDF][PDF] SKIPPOOL: Improved Sparse Hierarchical Graph Pooling with Differentiable Exploration
S Imaduwage - researchgate.net
Multiple techniques have been proposed to extract multi-resolution representations (MRR)
from graphs in Graph Representation Learning (GRL). Graph Neural Networks (GNN) …
from graphs in Graph Representation Learning (GRL). Graph Neural Networks (GNN) …
Differentiable Clustering and Partial Fenchel-Young Losses
We introduce a differentiable clustering method based on stochastic perturbations of
minimum-weight spanning forests. This allows us to include clustering in end-to-end …
minimum-weight spanning forests. This allows us to include clustering in end-to-end …
Learning Arborescence with An Efficient Inference Algorithm
N Jiang, MJ Jacobson, Y Xue - openreview.net
We consider a class of structured learning problems on arborescence (ie, the directed
spanning tree) from the input graph. The key step involved in this problem is predicting the …
spanning tree) from the input graph. The key step involved in this problem is predicting the …