Mmt-bench: A comprehensive multimodal benchmark for evaluating large vision-language models towards multitask agi

K Ying, F Meng, J Wang, Z Li, H Lin, Y Yang… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Vision-Language Models (LVLMs) show significant strides in general-purpose
multimodal applications such as visual dialogue and embodied navigation. However …

Segment anything in non-euclidean domains: Challenges and opportunities

Y **g, X Wang, D Tao - arxiv preprint arxiv:2304.11595, 2023 - arxiv.org
The recent work known as Segment Anything (SA) has made significant strides in pushing
the boundaries of semantic segmentation into the era of foundation models. The impact of …

Autodecoding latent 3d diffusion models

E Ntavelis, A Siarohin, K Olszewski… - Advances in …, 2023 - proceedings.neurips.cc
Diffusion-based methods have shown impressive visual results in the text-to-image domain.
They first learn a latent space using an autoencoder, then run a denoising process on the …

[HTML][HTML] Beyond observation: Deep learning for animal behavior and ecological conservation

LS Saoud, A Sultan, M Elmezain, M Heshmat… - Ecological …, 2024 - Elsevier
Recent advancements in deep learning have profoundly impacted the field of animal
behavioral research, offering researchers powerful tools for understanding the complexities …

Neural interactive keypoint detection

J Yang, A Zeng, F Li, S Liu… - Proceedings of the …, 2023 - openaccess.thecvf.com
This work proposes an end-to-end neural interactive keypoint detection framework named
Click-Pose, which can significantly reduce more than 10 times labeling costs of 2D keypoint …

Matching is not enough: A two-stage framework for category-agnostic pose estimation

M Shi, Z Huang, X Ma, X Hu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Category-agnostic pose estimation (CAPE) aims to predict keypoints for arbitrary categories
given support images with keypoint annotations. Existing approaches match the keypoints …

Harnessing text-to-image diffusion models for category-agnostic pose estimation

D Peng, Z Zhang, P Hu, Q Ke, DKY Yau… - European Conference on …, 2024 - Springer
Abstract Category-Agnostic Pose Estimation (CAPE) aims to detect keypoints of an arbitrary
unseen category in images, based on several provided examples of that category. This is a …

Detect any keypoints: An efficient light-weight few-shot keypoint detector

C Lu, P Koniusz - Proceedings of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
Recently the prompt-based models have become popular across various language and
vision tasks. In this work, we perform few-shot keypoint detection (FSKD) by detecting any …

Escape: Encoding super-keypoints for category-agnostic pose estimation

KD Nguyen, C Li, GH Lee - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
In this paper we tackle the task of category-agnostic pose estimation (CAPE) which aims to
predict poses for objects of any category with few annotated samples. Previous works either …

Meta-Point Learning and Refining for Category-Agnostic Pose Estimation

J Chen, J Yan, Y Fang, L Niu - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
Category-agnostic pose estimation (CAPE) aims to predict keypoints for arbitrary classes
given a few support images annotated with keypoints. Existing methods only rely on the …