The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
The rapid development of large language models (LLMs) has been witnessed in recent
years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from …
years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from …
Videomaker: Zero-shot customized video generation with the inherent force of video diffusion models
Zero-shot customized video generation has gained significant attention due to its substantial
application potential. Existing methods rely on additional models to extract and inject …
application potential. Existing methods rely on additional models to extract and inject …
Corrclip: Reconstructing correlations in clip with off-the-shelf foundation models for open-vocabulary semantic segmentation
D Zhang, F Liu, Q Tang - arxiv preprint arxiv:2411.10086, 2024 - arxiv.org
Open-vocabulary semantic segmentation aims to assign semantic labels to each pixel
without relying on a predefined set of categories. Contrastive Language-Image Pre-training …
without relying on a predefined set of categories. Contrastive Language-Image Pre-training …
Clip-moe: Towards building mixture of experts for clip with diversified multiplet upcycling
In recent years, Contrastive Language-Image Pre-training (CLIP) has become a cornerstone
in multimodal intelligence. However, recent studies have identified that the information loss …
in multimodal intelligence. However, recent studies have identified that the information loss …
Decentralized Diffusion Models
Large-scale AI model training divides work across thousands of GPUs, then synchronizes
gradients across them at each step. This incurs a significant network burden that only …
gradients across them at each step. This incurs a significant network burden that only …
QR-DETR: Query Routing for Detection Transformer
Detection Transformer (DETR) predicts object bounding boxes and classes from learned
object queries. However, DETR exhibits three major flaws:(1) Only a subset of object queries …
object queries. However, DETR exhibits three major flaws:(1) Only a subset of object queries …
WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization
Language has been useful in extending the vision encoder to data from diverse distributions
without empirical discovery in training domains. However, as the image description is mostly …
without empirical discovery in training domains. However, as the image description is mostly …
Towards maintainable machine learning development through continual and modular learning
O Ostapenko - 2024 - papyrus.bib.umontreal.ca
As machine learning models grow in size and complexity, their maintainability becomes a
critical concern, especially when they are increasingly deployed in dynamic, real-world …
critical concern, especially when they are increasingly deployed in dynamic, real-world …
Novel Techniques in Addressing Label Bias & Noise in Low-Quality Real-World Data
J Ma - 2024 - search.proquest.com
Data serves as the foundation in building effective deep learning algorithms, yet the process
of annotation and curation to maintain high data quality is time-intensive. The challenges …
of annotation and curation to maintain high data quality is time-intensive. The challenges …