A review of modern recommender systems using generative models (gen-recsys)

Y Deldjoo, Z He, J McAuley, A Korikov… - Proceedings of the 30th …, 2024 - dl.acm.org
Traditional recommender systems typically use user-item rating histories as their main data
source. However, deep generative models now have the capability to model and sample …

Interpretability research of deep learning: A literature survey

B Xua, G Yang - Information Fusion, 2024 - Elsevier
Deep learning (DL) has been widely used in various fields. However, its black-box nature
limits people's understanding and trust in its decision-making process. Therefore, it becomes …

Adashield: Safeguarding multimodal large language models from structure-based attack via adaptive shield prompting

Y Wang, X Liu, Y Li, M Chen, C **ao - European Conference on Computer …, 2024 - Springer
With the advent and widespread deployment of Multimodal Large Language Models
(MLLMs), the imperative to ensure their safety has become increasingly pronounced …

Brave: Broadening the visual encoding of vision-language models

OF Kar, A Tonioni, P Poklukar, A Kulshrestha… - … on Computer Vision, 2024 - Springer
Vision-language models (VLMs) are typically composed of a vision encoder, eg CLIP, and a
language model (LM) that interprets the encoded features to solve downstream tasks …

Llm inference unveiled: Survey and roofline model insights

Z Yuan, Y Shang, Y Zhou, Z Dong, Z Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org
The field of efficient Large Language Model (LLM) inference is rapidly evolving, presenting a
unique blend of opportunities and challenges. Although the field has expanded and is …

BB-GeoGPT: A framework for learning a large language model for geographic information science

Y Zhang, Z Wang, Z He, J Li, G Mai, J Lin, C Wei… - Information Processing …, 2024 - Elsevier
Large language models (LLMs) exhibit impressive capabilities across diverse tasks in
natural language processing. Nevertheless, challenges arise such as large model …

A survey of multimodal large language model from a data-centric perspective

T Bai, H Liang, B Wan, Y Xu, X Li, S Li, L Yang… - arxiv preprint arxiv …, 2024 - arxiv.org
Multimodal large language models (MLLMs) enhance the capabilities of standard large
language models by integrating and processing data from multiple modalities, including text …

Mobilevlm v2: Faster and stronger baseline for vision language model

X Chu, L Qiao, X Zhang, S Xu, F Wei, Y Yang… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce MobileVLM V2, a family of significantly improved vision language models
upon MobileVLM, which proves that a delicate orchestration of novel architectural design, an …

Facial affective behavior analysis with instruction tuning

Y Li, A Dao, W Bao, Z Tan, T Chen, H Liu… - European Conference on …, 2024 - Springer
Facial affective behavior analysis (FABA) is crucial for understanding human mental states
from images. However, traditional approaches primarily deploy models to discriminate …

A comprehensive review of multimodal large language models: Performance and challenges across different tasks

J Wang, H Jiang, Y Liu, C Ma, X Zhang, Y Pan… - arxiv preprint arxiv …, 2024 - arxiv.org
In an era defined by the explosive growth of data and rapid technological advancements,
Multimodal Large Language Models (MLLMs) stand at the forefront of artificial intelligence …