[HTML][HTML] A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas
YOLO has become a central real-time object detection system for robotics, driverless cars,
and video monitoring applications. We present a comprehensive analysis of YOLO's …
and video monitoring applications. We present a comprehensive analysis of YOLO's …
A comprehensive overview of large language models
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …
natural language processing tasks and beyond. This success of LLMs has led to a large …
Segment anything
Abstract We introduce the Segment Anything (SA) project: a new task, model, and dataset for
image segmentation. Using our efficient model in a data collection loop, we built the largest …
image segmentation. Using our efficient model in a data collection loop, we built the largest …
Large language models generate functional protein sequences across diverse families
Deep-learning language models have shown promise in various biotechnological
applications, including protein design and engineering. Here we describe ProGen, a …
applications, including protein design and engineering. Here we describe ProGen, a …
Solving olympiad geometry without human demonstrations
Proving mathematical theorems at the olympiad level represents a notable milestone in
human-level automated reasoning,,–, owing to their reputed difficulty among the world's best …
human-level automated reasoning,,–, owing to their reputed difficulty among the world's best …
A comprehensive survey of continual learning: theory, method and application
To cope with real-world dynamics, an intelligent system needs to incrementally acquire,
update, accumulate, and exploit knowledge throughout its lifetime. This ability, known as …
update, accumulate, and exploit knowledge throughout its lifetime. This ability, known as …
Vid2seq: Large-scale pretraining of a visual language model for dense video captioning
In this work, we introduce Vid2Seq, a multi-modal single-stage dense event captioning
model pretrained on narrated videos which are readily-available at scale. The Vid2Seq …
model pretrained on narrated videos which are readily-available at scale. The Vid2Seq …
ediff-i: Text-to-image diffusion models with an ensemble of expert denoisers
Large-scale diffusion-based generative models have led to breakthroughs in text-
conditioned high-resolution image synthesis. Starting from random noise, such text-to-image …
conditioned high-resolution image synthesis. Starting from random noise, such text-to-image …
Robust deep learning–based protein sequence design using ProteinMPNN
Although deep learning has revolutionized protein structure prediction, almost all
experimentally characterized de novo protein designs have been generated using …
experimentally characterized de novo protein designs have been generated using …
A generalist agent
Inspired by progress in large-scale language modeling, we apply a similar approach
towards building a single generalist agent beyond the realm of text outputs. The agent …
towards building a single generalist agent beyond the realm of text outputs. The agent …