Revisiting class-incremental learning with pre-trained models: Generalizability and adaptivity are all you need

DW Zhou, ZW Cai, HJ Ye, DC Zhan, Z Liu - arxiv preprint arxiv …, 2023 - arxiv.org
Class-incremental learning (CIL) aims to adapt to emerging new classes without forgetting
old ones. Traditional CIL models are trained from scratch to continually acquire knowledge …

Videopoet: A large language model for zero-shot video generation

D Kondratyuk, L Yu, X Gu, J Lezama, J Huang… - arxiv preprint arxiv …, 2023 - arxiv.org
We present VideoPoet, a language model capable of synthesizing high-quality video, with
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …

Revisiting class-incremental learning with pre-trained models: Generalizability and adaptivity are all you need

DW Zhou, ZW Cai, HJ Ye, DC Zhan, Z Liu - International Journal of …, 2024 - Springer
Class-incremental learning (CIL) aims to adapt to emerging new classes without forgetting
old ones. Traditional CIL models are trained from scratch to continually acquire knowledge …

Videoprism: A foundational visual encoder for video understanding

L Zhao, NB Gundavarapu, L Yuan, H Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce VideoPrism, a general-purpose video encoder that tackles diverse video
understanding tasks with a single frozen model. We pretrain VideoPrism on a …

Language as the medium: Multimodal video classification through text only

L Hanu, AL Verő, J Thewlis - arxiv preprint arxiv:2309.10783, 2023 - arxiv.org
Despite an exciting new wave of multimodal machine learning models, current approaches
still struggle to interpret the complex contextual relationships between the different …

Human activity recognition: A review of deep learning‐based methods

SJ Dutta, T Boongoen, R Zwiggelaar - IET Computer Vision, 2025 - Wiley Online Library
Abstract Human Activity Recognition (HAR) covers methods for automatically identifying
human activities from a stream of data. End‐users of HAR methods cover a range of sectors …

Sparse moe as a new treatment: Addressing forgetting, fitting, learning issues in multi-modal multi-task learning

J Peng, K Zhou, R Zhou, T Hartvigsen, Y Zhang… - 2023 - openreview.net
Sparse Mixture-of-Experts (SMoE) is a promising paradigm that can be easily tailored for
multi-task learning. Its conditional computing nature allows us to organically allocate …

Towards a Generalist and Blind RGB-X Tracker

Y Tan, Z Wu, Y Fu, Z Zhou, G Sun, C Ma… - arxiv preprint arxiv …, 2024 - arxiv.org
With the emergence of a single large model capable of successfully solving a multitude of
tasks in NLP, there has been growing research interest in achieving similar goals in …

Foundation Models for Video Understanding: A Survey

N Madan, A Møgelmose, R Modi, YS Rawat… - arxiv preprint arxiv …, 2024 - arxiv.org
Video Foundation Models (ViFMs) aim to learn a general-purpose representation for various
video understanding tasks. Leveraging large-scale datasets and powerful models, ViFMs …

SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR

Q Zhao, G Sun, C Zhang, M Xu, TF Zheng - arxiv preprint arxiv …, 2024 - arxiv.org
Mixture-of-experts (MoE) models have achieved excellent results in many tasks. However,
conventional MoE models are often very large, making them challenging to deploy on …