Artificial intelligence for science in quantum, atomistic, and continuum systems

X Zhang, L Wang, J Helwig, Y Luo, C Fu, Y **e… - arxiv preprint arxiv …, 2023 - arxiv.org
Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural
sciences. Today, AI has started to advance natural sciences by improving, accelerating, and …

Symmetry-informed geometric representation for molecules, proteins, and crystalline materials

S Liu, Y Li, Z Li, Z Zheng, C Duan… - Advances in neural …, 2023 - proceedings.neurips.cc
Artificial intelligence for scientific discovery has recently generated significant interest within
the machine learning and scientific communities, particularly in the domains of chemistry …

Clustering for protein representation learning

R Quan, W Wang, F Ma, H Fan… - Proceedings of the …, 2024 - openaccess.thecvf.com
Protein representation learning is a challenging task that aims to capture the structure and
function of proteins from their amino acid sequences. Previous methods largely ignored the …

Advances of deep learning in protein science: A comprehensive survey

B Hu, C Tan, L Wu, J Zheng, J **a, Z Gao, Z Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Protein representation learning plays a crucial role in understanding the structure and
function of proteins, which are essential biomolecules involved in various biological …

Prott3: Protein-to-text generation for text-based protein understanding

Z Liu, A Zhang, H Fei, E Zhang, X Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Language Models (LMs) excel in understanding textual descriptions of proteins, as evident
in biomedical question-answering tasks. However, their capability falters with raw protein …

A systematic study of joint representation learning on protein sequences and structures

Z Zhang, C Wang, M Xu, V Chenthamarakshan… - arxiv preprint arxiv …, 2023 - arxiv.org
Learning effective protein representations is critical in a variety of tasks in biology such as
predicting protein functions. Recent sequence representation learning methods based on …

Foldtoken: Learning protein language via vector quantization and beyond

Z Gao, C Tan, J Wang, Y Huang, L Wu, SZ Li - arxiv preprint arxiv …, 2024 - arxiv.org
Is there a foreign language describing protein sequences and structures simultaneously?
Protein structures, represented by continuous 3D points, have long posed a challenge due …

Clipzyme: Reaction-conditioned virtual screening of enzymes

PG Mikhael, I Chinn, R Barzilay - arxiv preprint arxiv:2402.06748, 2024 - arxiv.org
Computational screening of naturally occurring proteins has the potential to identify efficient
catalysts among the hundreds of millions of sequences that remain uncharacterized. Current …

Evaluating representation learning on the protein structure universe

AR Jamasb, A Morehead, CK Joshi, Z Zhang, K Didi… - Ar**v, 2024 - pmc.ncbi.nlm.nih.gov
We introduce ProteinWorkshop, a comprehensive benchmark suite for representation
learning on protein structures with Geometric Graph Neural Networks. We consider large …

Learning Complete Protein Representation by Dynamically Coupling of Sequence and Structure

B Hu, C Tan, J **a, Y Liu, L Wu… - Advances in …, 2025 - proceedings.neurips.cc
Learning effective representations is imperative for comprehending proteins and
deciphering their biological functions. Recent strides in language models and graph neural …