Google 학술 검색

The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective

Z Qin, D Chen, W Zhang, L Yao, Y Huang… - arxiv preprint arxiv …, 2024 - arxiv.org

The rapid development of large language models (LLMs) has been witnessed in recent
years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from …

저장 인용 5회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Videomaker: Zero-shot customized video generation with the inherent force of video diffusion models

T Wu, Y Zhang, X Cun, Z Qi, J Pu, H Dou… - arxiv preprint arxiv …, 2024 - arxiv.org

Zero-shot customized video generation has gained significant attention due to its substantial
application potential. Existing methods rely on additional models to extract and inject …

저장 인용 1회 인용 관련 학술자료 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Corrclip: Reconstructing correlations in clip with off-the-shelf foundation models for open-vocabulary semantic segmentation

D Zhang, F Liu, Q Tang - arxiv preprint arxiv:2411.10086, 2024 - arxiv.org

Open-vocabulary semantic segmentation aims to assign semantic labels to each pixel
without relying on a predefined set of categories. Contrastive Language-Image Pre-training …

저장 인용 1회 인용 관련 학술자료 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Clip-moe: Towards building mixture of experts for clip with diversified multiplet upcycling

J Zhang, X Qu, T Zhu, Y Cheng - arxiv preprint arxiv:2409.19291, 2024 - arxiv.org

In recent years, Contrastive Language-Image Pre-training (CLIP) has become a cornerstone
in multimodal intelligence. However, recent studies have identified that the information loss …

저장 인용 1회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Decentralized Diffusion Models

D McAllister, M Tancik, J Song, A Kanazawa - arxiv preprint arxiv …, 2025 - arxiv.org

Large-scale AI model training divides work across thousands of GPUs, then synchronizes
gradients across them at each step. This incurs a significant network burden that only …

저장 인용 관련 학술자료 HTML 버전

[Free GPT-4]

[PDF] thecvf.com

QR-DETR: Query Routing for Detection Transformer

T Senthivel, NS Vu - … of the Asian Conference on Computer …, 2024 - openaccess.thecvf.com

Detection Transformer (DETR) predicts object bounding boxes and classes from learned
object queries. However, DETR exhibits three major flaws:(1) Only a subset of object queries …

저장 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization

J Ma, Y Niu, S Huang, G Han, SF Chang - arxiv preprint arxiv:2405.18405, 2024 - arxiv.org

Language has been useful in extending the vision encoder to data from diverse distributions
without empirical discovery in training domains. However, as the image description is mostly …

저장 인용 1회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] umontreal.ca

Towards maintainable machine learning development through continual and modular learning

O Ostapenko - 2024 - papyrus.bib.umontreal.ca

As machine learning models grow in size and complexity, their maintainability becomes a
critical concern, especially when they are increasingly deployed in dynamic, real-world …

저장 인용 관련 학술자료 HTML 버전

Novel Techniques in Addressing Label Bias & Noise in Low-Quality Real-World Data

J Ma - 2024 - search.proquest.com

Data serves as the foundation in building effective deep learning algorithms, yet the process
of annotation and curation to maintain high data quality is time-intensive. The challenges …

저장 인용 관련 학술자료

알림 만들기

인용

고급 검색

라이브러리에 저장됨

MoDE: CLIP Data Experts via Clustering

The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective

Videomaker: Zero-shot customized video generation with the inherent force of video diffusion models

Corrclip: Reconstructing correlations in clip with off-the-shelf foundation models for open-vocabulary semantic segmentation

Clip-moe: Towards building mixture of experts for clip with diversified multiplet upcycling

Decentralized Diffusion Models

QR-DETR: Query Routing for Detection Transformer

WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization

Towards maintainable machine learning development through continual and modular learning

Novel Techniques in Addressing Label Bias & Noise in Low-Quality Real-World Data