- Academic Search

Y Gou, K Chen, Z Liu, L Hong, H Xu, Z Li… - … on Computer Vision, 2024 - Springer

Multimodal large language models (MLLMs) have shown impressive reasoning abilities.
However, they are also more vulnerable to jailbreak attacks than their LLM predecessors …

Salva Cita Citato da 25 Articoli correlati Tutte e 2 le versioni

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Automated evaluation of large vision-language models on self-driving corner cases

K Chen, Y Li, W Zhang, Y Liu, P Li, R Gao… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Vision-Language Models (LVLMs) have received widespread attention for advancing
the interpretable self-driving. Existing evaluations of LVLMs primarily focus on multi-faceted …

Salva Cita Citato da 28 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Emova: Empowering language models to see, hear and speak with vivid emotions

K Chen, Y Gou, R Huang, Z Liu, D Tan, J Xu… - arxiv preprint arxiv …, 2024 - arxiv.org

GPT-4o, an omni-modal model that enables vocal conversations with diverse emotions and
tones, marks a milestone for omni-modal foundation models. However, empowering Large …

Salva Cita Citato da 13 Articoli correlati Tutte e 3 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model

X He, Q Huang, Z Zhang, Z Lin, Z Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Co-speech gestures if presented in the lively form of videos can achieve superior visual
effects in human-machine interaction. While previous works mostly generate structural …

Salva Cita Citato da 11 Articoli correlati Tutte e 3 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Diffusion models for intelligent transportation systems: A survey

M Peng, K Chen, X Guo, Q Zhang, H Lu… - arxiv preprint arxiv …, 2024 - arxiv.org

Intelligent Transportation Systems (ITS) are vital in modern traffic management and
optimization, significantly enhancing traffic efficiency and safety. Recently, diffusion models …

Salva Cita Citato da 2 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

DCAFuse: Dual-Branch Diffusion-CNN Complementary Feature Aggregation Network for Multi-Modality Image Fusion

X Lu, Y Jiang, H Hong, Q Sun, C Zhuo - Proceedings of the 32nd ACM …, 2024 - dl.acm.org

Multi-modality image fusion (MMIF) aims to integrate the complementary features of source
images into the fused image, including target saliency and texture specifics. Recently, image …

Salva Cita Articoli correlati Tutte e 2 le versioni

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

LayoutEnc: Leveraging Enhanced Layout Representations for Transformer-based Complex Scene Synthesis

X Cui, Q Sun, M Wang, L Li, W Zhou, H Li - ACM Transactions on …, 2025 - dl.acm.org

In complex scene synthesis, the effective representation of layouts is paramount. This paper
introduces LayoutEnc, an advanced approach specifically designed to enhance layout …

Salva Cita Articoli correlati

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation

B Li, X **, J Wang, Y Shi, Y Sun, X Wang, Z Ma… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent diffusion models have demonstrated remarkable performance in both 3D scene
generation and perception tasks. Nevertheless, existing methods typically separate these …

Salva Cita Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Training-free point cloud recognition based on geometric and semantic information fusion

Y Chen, D Huang, Z Liao, X Cheng, X Li… - arxiv preprint arxiv …, 2024 - arxiv.org

The trend of employing training-free methods for point cloud recognition is becoming
increasingly popular due to its significant reduction in computational resources and time …

Salva Cita Citato da 1 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

BDC-Occ: Binarized Deep Convolution Unit For Binarized Occupancy Network

Z Zhang, Z Xu, W Yang, Q Liao, JH Xue - arxiv preprint arxiv:2405.17037, 2024 - arxiv.org

Existing 3D occupancy networks demand significant hardware resources, hindering the
deployment of edge devices. Binarized Neural Networks (BNN) offer substantially reduced …

Salva Cita Citato da 1 Articoli correlati Tutte e 2 le versioni Versione HTML

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

Detdiffusion: Synergizing generative and perceptive models for enhanced data generation and...

Eyes closed, safety on: Protecting multimodal llms via image-to-text transformation

Automated evaluation of large vision-language models on self-driving corner cases

Emova: Empowering language models to see, hear and speak with vivid emotions

Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model

Diffusion models for intelligent transportation systems: A survey

DCAFuse: Dual-Branch Diffusion-CNN Complementary Feature Aggregation Network for Multi-Modality Image Fusion

LayoutEnc: Leveraging Enhanced Layout Representations for Transformer-based Complex Scene Synthesis

OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation

Training-free point cloud recognition based on geometric and semantic information fusion

BDC-Occ: Binarized Deep Convolution Unit For Binarized Occupancy Network