Študovňa Google

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Uložiť Citovať Citované 95-krát Súvisiace články

[Free GPT-4]
[DeepSeek]

[PDF] sciencedirect.com

Transfer learning in environmental remote sensing

Y Ma, S Chen, S Ermon, DB Lobell - Remote Sensing of Environment, 2024 - Elsevier

Abstract Machine learning (ML) has proven to be a powerful tool for utilizing the rapidly
increasing amounts of remote sensing data for environmental monitoring. Yet ML models …

Uložiť Citovať Citované 120-krát Súvisiace články Všetky verzie 3

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Uložiť Citovať Citované 231-krát Súvisiace články Všetky verzie 7 Vyhľadávanie knižnice HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Sequential modeling enables scalable learning for large vision models

Y Bai, X Geng, K Mangalam, A Bar… - Proceedings of the …, 2024 - openaccess.thecvf.com

We introduce a novel sequential modeling approach which enables learning a Large Vision
Model (LVM) without making use of any linguistic data. To do this we define a common …

Uložiť Citovať Citované 158-krát Súvisiace články Všetky verzie 6 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] nature.com

Cellpose 2.0: how to train your own model

M Pachitariu, C Stringer - Nature methods, 2022 - nature.com

Pretrained neural network models for biological segmentation can provide good out-of-the-
box results for many image types. However, such models do not allow users to adapt the …

Uložiť Citovať Citované 652-krát Súvisiace články Všetky verzie 7

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Unified-io 2: Scaling autoregressive multimodal models with vision language audio and action

J Lu, C Clark, S Lee, Z Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

We present Unified-IO 2 a multimodal and multi-skill unified model capable of following
novel instructions. Unified-IO 2 can use text images audio and/or videos as input and can …

Uložiť Citovať Citované 124-krát Súvisiace články Všetky verzie 7 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

UniDepth: Universal monocular metric depth estimation

L Piccinelli, YH Yang, C Sakaridis… - Proceedings of the …, 2024 - openaccess.thecvf.com

Accurate monocular metric depth estimation (MMDE) is crucial to solving downstream tasks
in 3D perception and modeling. However the remarkable accuracy of recent MMDE methods …

Uložiť Citovať Citované 86-krát Súvisiace články Všetky verzie 10 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Metric3d: Towards zero-shot metric 3d prediction from a single image

W Yin, C Zhang, H Chen, Z Cai, G Yu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Reconstructing accurate 3D scenes from images is a long-standing vision task. Due to the ill-
posedness of the single-image reconstruction problem, most well-established methods are …

Uložiť Citovať Citované 139-krát Súvisiace články Všetky verzie 7 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Pointclip: Point cloud understanding by clip

R Zhang, Z Guo, W Zhang, K Li… - Proceedings of the …, 2022 - openaccess.thecvf.com

Recently, zero-shot and few-shot learning via Contrastive Vision-Language Pre-training
(CLIP) have shown inspirational performance on 2D visual recognition, which learns to …

Uložiť Citovať Citované 466-krát Súvisiace články Všetky verzie 5 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arxiv preprint arxiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Uložiť Citovať Citované 4802-krát Súvisiace články Všetky verzie 2 HTML verzia

Vytvoriť upozornenie

Citovať

Rozšírené vyhľadávanie

Uložené do mojej knižnice

Taskonomy: Disentangling task transfer learning

Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

Transfer learning in environmental remote sensing

Multimodal foundation models: From specialists to general-purpose assistants

Sequential modeling enables scalable learning for large vision models

Cellpose 2.0: how to train your own model

Unified-io 2: Scaling autoregressive multimodal models with vision language audio and action

UniDepth: Universal monocular metric depth estimation

Metric3d: Towards zero-shot metric 3d prediction from a single image

Pointclip: Point cloud understanding by clip

On the opportunities and risks of foundation models