Revisiting class-incremental learning with pre-trained models: Generalizability and adaptivity are all you need
Class-incremental learning (CIL) aims to adapt to emerging new classes without forgetting
old ones. Traditional CIL models are trained from scratch to continually acquire knowledge …
old ones. Traditional CIL models are trained from scratch to continually acquire knowledge …
Videopoet: A large language model for zero-shot video generation
We present VideoPoet, a language model capable of synthesizing high-quality video, with
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …
Revisiting class-incremental learning with pre-trained models: Generalizability and adaptivity are all you need
Class-incremental learning (CIL) aims to adapt to emerging new classes without forgetting
old ones. Traditional CIL models are trained from scratch to continually acquire knowledge …
old ones. Traditional CIL models are trained from scratch to continually acquire knowledge …
Videoprism: A foundational visual encoder for video understanding
We introduce VideoPrism, a general-purpose video encoder that tackles diverse video
understanding tasks with a single frozen model. We pretrain VideoPrism on a …
understanding tasks with a single frozen model. We pretrain VideoPrism on a …
Language as the medium: Multimodal video classification through text only
Despite an exciting new wave of multimodal machine learning models, current approaches
still struggle to interpret the complex contextual relationships between the different …
still struggle to interpret the complex contextual relationships between the different …
Human activity recognition: A review of deep learning‐based methods
Abstract Human Activity Recognition (HAR) covers methods for automatically identifying
human activities from a stream of data. End‐users of HAR methods cover a range of sectors …
human activities from a stream of data. End‐users of HAR methods cover a range of sectors …
Sparse moe as a new treatment: Addressing forgetting, fitting, learning issues in multi-modal multi-task learning
Sparse Mixture-of-Experts (SMoE) is a promising paradigm that can be easily tailored for
multi-task learning. Its conditional computing nature allows us to organically allocate …
multi-task learning. Its conditional computing nature allows us to organically allocate …
Towards a Generalist and Blind RGB-X Tracker
With the emergence of a single large model capable of successfully solving a multitude of
tasks in NLP, there has been growing research interest in achieving similar goals in …
tasks in NLP, there has been growing research interest in achieving similar goals in …
Foundation Models for Video Understanding: A Survey
Video Foundation Models (ViFMs) aim to learn a general-purpose representation for various
video understanding tasks. Leveraging large-scale datasets and powerful models, ViFMs …
video understanding tasks. Leveraging large-scale datasets and powerful models, ViFMs …
SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR
Mixture-of-experts (MoE) models have achieved excellent results in many tasks. However,
conventional MoE models are often very large, making them challenging to deploy on …
conventional MoE models are often very large, making them challenging to deploy on …