- Academic Search

Z Zheng, X Peng, T Yang, C Shen, S Li, H Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Vision and language are the two foundational senses for humans, and they build up our
cognitive ability and intelligence. While significant breakthroughs have been made in AI …

Simpan Kutip Dirujuk 77 kali Artikel terkait 2 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Stable and low-precision training for large-scale vision-language models

M Wortsman, T Dettmers… - Advances in …, 2023 - proceedings.neurips.cc

We introduce new methods for 1) accelerating and 2) stabilizing training for large language-
vision models. 1) For acceleration, we introduce SwitchBack, a linear layer for int8 quantized …

Simpan Kutip Dirujuk 36 kali Artikel terkait 6 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Virchow2: Scaling self-supervised mixed magnification models in pathology

E Zimmermann, E Vorontsov, J Viret, A Casson… - arxiv preprint arxiv …, 2024 - arxiv.org

Foundation models are rapidly being developed for computational pathology applications.
However, it remains an open question which factors are most important for downstream …

Simpan Kutip Dirujuk 21 kali Artikel terkait 4 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

On the implicit bias of adam

MD Cattaneo, JM Klusowski, B Shigida - arxiv preprint arxiv:2309.00079, 2023 - arxiv.org

In previous literature, backward error analysis was used to find ordinary differential
equations (ODEs) approximating the gradient descent trajectory. It was found that finite step …

Simpan Kutip Dirujuk 20 kali Artikel terkait 6 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Xgen-7b technical report

E Nijkamp, T **e, H Hayashi, B Pang, C **a… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language Models (LLMs) have become ubiquitous across various domains,
transforming the way we interact with information and conduct research. However, most high …

Simpan Kutip Dirujuk 29 kali Artikel terkait 3 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Jointly training large autoregressive multimodal models

E Aiello, L Yu, Y Nie, A Aghajanyan, B Oguz - arxiv preprint arxiv …, 2023 - arxiv.org

In recent years, advances in the large-scale pretraining of language and text-to-image
models have revolutionized the field of machine learning. Yet, integrating these two …

Simpan Kutip Dirujuk 23 kali Artikel terkait 3 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Towards foundation models for materials science: The open matsci ml toolkit

KLK Lee, C Gonzales, M Spellings, M Galkin… - Proceedings of the SC' …, 2023 - dl.acm.org

Artificial intelligence and machine learning have shown great promise in their ability to
accelerate novel materials discovery. As researchers and domain scientists seek to unify …

Simpan Kutip Dirujuk 7 kali Artikel terkait 3 versi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Why transformers need adam: A hessian perspective

Y Zhang, C Chen, T Ding, Z Li, R Sun… - arxiv preprint arxiv …, 2024 - arxiv.org

SGD performs worse than Adam by a significant margin on Transformers, but the reason
remains unclear. In this work, we provide an explanation of SGD's failure on Transformers …

Simpan Kutip Dirujuk 25 kali Artikel terkait 2 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Recontrast: Domain-specific anomaly detection via contrastive reconstruction

J Guo, L Jia, W Zhang, H Li - Advances in Neural …, 2024 - proceedings.neurips.cc

Most advanced unsupervised anomaly detection (UAD) methods rely on modeling feature
representations of frozen encoder networks pre-trained on large-scale datasets, eg …

Simpan Kutip Dirujuk 23 kali Artikel terkait 5 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Data efficient neural scaling law via model reusing

P Wang, R Panda, Z Wang - International Conference on …, 2023 - proceedings.mlr.press

The number of parameters in large transformers has been observed to grow exponentially.
Despite notable performance improvements, concerns have been raised that such a …

Simpan Kutip Dirujuk 10 kali Artikel terkait 5 versi Versi HTML

Buat notifikasi

Kutip

Penelusuran lanjutan

Disimpan ke Koleksi saya

A theory on adam instability in large-scale machine learning

Open-sora: Democratizing efficient video production for all

Stable and low-precision training for large-scale vision-language models

Virchow2: Scaling self-supervised mixed magnification models in pathology

On the implicit bias of adam

Xgen-7b technical report

Jointly training large autoregressive multimodal models

Towards foundation models for materials science: The open matsci ml toolkit

Why transformers need adam: A hessian perspective

Recontrast: Domain-specific anomaly detection via contrastive reconstruction

Data efficient neural scaling law via model reusing