Tool learning with foundation models
Humans possess an extraordinary ability to create and utilize tools. With the advent of
foundation models, artificial intelligence systems have the potential to be equally adept in …
foundation models, artificial intelligence systems have the potential to be equally adept in …
A survey of embodied ai: From simulators to research tasks
There has been an emerging paradigm shift from the era of “internet AI” to “embodied AI,”
where AI algorithms and agents no longer learn from datasets of images, videos or text …
where AI algorithms and agents no longer learn from datasets of images, videos or text …
Objaverse: A universe of annotated 3d objects
Massive data corpora like WebText, Wikipedia, Conceptual Captions, WebImageText, and
LAION have propelled recent dramatic progress in AI. Large neural models trained on such …
LAION have propelled recent dramatic progress in AI. Large neural models trained on such …
Synthetic data from diffusion models improves imagenet classification
Deep generative models are becoming increasingly powerful, now generating diverse high
fidelity photo-realistic samples given text prompts. Have they reached the point where …
fidelity photo-realistic samples given text prompts. Have they reached the point where …
On the opportunities and risks of foundation models
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …
Habitat 2.0: Training home assistants to rearrange their habitat
Abstract We introduce Habitat 2.0 (H2. 0), a simulation platform for training virtual robots in
interactive 3D environments and complex physics-enabled scenarios. We make …
interactive 3D environments and complex physics-enabled scenarios. We make …
🏘️ ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
Massive datasets and high-capacity models have driven many recent advancements in
computer vision and natural language understanding. This work presents a platform to …
computer vision and natural language understanding. This work presents a platform to …
Improving multimodal datasets with image captioning
Massive web datasets play a key role in the success of large vision-language models like
CLIP and Flamingo. However, the raw web data is noisy, and existing filtering methods to …
CLIP and Flamingo. However, the raw web data is noisy, and existing filtering methods to …
Kubric: A scalable dataset generator
Data is the driving force of machine learning, with the amount and quality of training data
often being more important for the performance of a system than architecture and training …
often being more important for the performance of a system than architecture and training …
Ai2-thor: An interactive 3d environment for visual ai
We introduce The House Of inteRactions (THOR), a framework for visual AI research,
available at http://ai2thor. allenai. org. AI2-THOR consists of near photo-realistic 3D indoor …
available at http://ai2thor. allenai. org. AI2-THOR consists of near photo-realistic 3D indoor …