Dense object detection methods in RAW UAV imagery based on YOLOv8

Z Wu, X Wang, M Jia, M Liu, C Sun, C Wu, J Wang - Scientific reports, 2024 - nature.com
Accurate, fast and lightweight dense target detection methods are highly important for
precision agriculture. To detect dense apricot flowers using drones, we propose an …

Tiny Models are the Computational Saver for Large Models

Q Wang, B Cardiff, A Frappé, B Larras… - European Conference on …, 2024 - Springer
This paper introduces TinySaver, an early-exit-like dynamic model compression approach
which employs tiny models to substitute large models adaptively. Distinct from traditional …

A survey on knowledge-enhanced multimodal learning

M Lymperaiou, G Stamou - Artificial Intelligence Review, 2024 - Springer
Multimodal learning has been a field of increasing interest, aiming to combine various
modalities in a single joint representation. Especially in the area of visiolinguistic (VL) …

XAMI--A Benchmark Dataset for Artefact Detection in XMM-Newton Optical Images

EI Dima, P Gómez, S Kruk, P Kretschmar… - arxiv preprint arxiv …, 2024 - arxiv.org
Reflected or scattered light produce artefacts in astronomical observations that can
negatively impact the scientific study. Hence, automated detection of these artefacts is highly …

Towards Automation of Pollen Monitoring-Dealing with the Background in Pollen Monitoring Images

E Kubera, A Wieczorkowska… - … Conference on Machine …, 2023 - Springer
Many people suffer from pollen allergies. Therefore, pollen monitoring is performed
worldwide, and pollen traps are used for this purpose. Specialists are analyzing the …

NII-UIT at VBS2025: Multimodal Video Retrieval with LLM Integration and Dynamic Temporal Search

BT Gia, TBC Khanh, TLT Thanh, TT Doan, K Le… - … on Multimedia Modeling, 2025 - Springer
In summary, our innovative retrieval system for interactive video search, developed for the
VBS 2025 competition, significantly elevates the user experience through the utilization of …

How well does GPT-4o understand vision? Solving standard computer vision tasks with multimodal foundation models

R Ramachandran, A Garjani, A Atanov, OF Kar… - openreview.net
Multimodal foundation models, such as GPT-4o, have made remarkable progress recently.
However, it is not clear exactly where these models stand in terms of understanding …