Data-centric artificial intelligence: A survey

D Zha, ZP Bhat, KH Lai, F Yang, Z Jiang… - ACM Computing …, 2025 - dl.acm.org
Artificial Intelligence (AI) is making a profound impact in almost every domain. A vital enabler
of its great success is the availability of abundant and high-quality data for building machine …

I-TASSER-MTD: a deep-learning-based platform for multi-domain protein structure and function prediction

X Zhou, W Zheng, Y Li, R Pearce, C Zhang, EW Bell… - Nature protocols, 2022 - nature.com
Most proteins in cells are composed of multiple folding units (or domains) to perform
complex functions in a cooperative manner. Relative to the rapid progress in single-domain …

The person-to-person transmission landscape of the gut and oral microbiomes

M Valles-Colomer, A Blanco-Míguez, P Manghi… - Nature, 2023 - nature.com
The human microbiome is an integral component of the human body and a co-determinant
of several health conditions,. However, the extent to which interpersonal relations shape the …

OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization

G Ahdritz, N Bouatta, C Floristean, S Kadyan, Q **a… - Nature …, 2024 - nature.com
AlphaFold2 revolutionized structural biology with the ability to predict protein structures with
exceptionally high accuracy. Its implementation, however, lacks the code and data required …

Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4

A Blanco-Míguez, F Beghini, F Cumbo, LJ McIver… - Nature …, 2023 - nature.com
Metagenomic assembly enables new organism discovery from microbial communities, but it
can only capture few abundant organisms from most metagenomes. Here we present …

AlphaFold2 and its applications in the fields of biology and medicine

Z Yang, X Zeng, Y Zhao, R Chen - Signal Transduction and Targeted …, 2023 - nature.com
Abstract AlphaFold2 (AF2) is an artificial intelligence (AI) system developed by DeepMind
that can predict three-dimensional (3D) structures of proteins from amino acid sequences …

ProtGPT2 is a deep unsupervised language model for protein design

N Ferruz, S Schmidt, B Höcker - Nature communications, 2022 - nature.com
Protein design aims to build novel proteins customized for specific purposes, thereby
holding the potential to tackle many environmental and biomedical problems. Recent …

Learning inverse folding from millions of predicted structures

C Hsu, R Verkuil, J Liu, Z Lin, B Hie… - International …, 2022 - proceedings.mlr.press
We consider the problem of predicting a protein sequence from its backbone atom
coordinates. Machine learning approaches to this problem to date have been limited by the …

Variability of strain engraftment and predictability of microbiome composition after fecal microbiota transplantation across different diseases

G Ianiro, M Punčochář, N Karcher, S Porcari… - Nature Medicine, 2022 - nature.com
Fecal microbiota transplantation (FMT) is highly effective against recurrent Clostridioides
difficile infection and is considered a promising treatment for other microbiome-related …

[HTML][HTML] An expanded arsenal of immune systems that protect bacteria from phages

A Millman, S Melamed, A Leavitt, S Doron… - Cell host & …, 2022 - cell.com
Bacterial anti-phage systems are frequently clustered in microbial genomes, forming
defense islands. This property enabled the recent discovery of multiple defense systems …