Scientific large language models: A survey on biological & chemical domains

Q Zhang, K Ding, T Lv, X Wang, Q Yin, Y Zhang… - ACM Computing …, 2024 - dl.acm.org
Large Language Models (LLMs) have emerged as a transformative power in enhancing
natural language comprehension, representing a significant stride toward artificial general …

Impacts of bioinformatics to medicinal chemistry

KC Chou - Medicinal chemistry, 2015 - ingentaconnect.com
Facing the explosive growth of biological sequence data, such as those of protein/peptide
and DNA/RNA, generated in the post-genomic age, many bioinformatical and mathematical …

Pfam: The protein families database in 2021

J Mistry, S Chuguransky, L Williams… - Nucleic acids …, 2021 - academic.oup.com
The Pfam database is a widely used resource for classifying protein sequences into families
and domains. Since Pfam was last described in this journal, over 350 new families have …

The Pfam protein families database in 2019

S El-Gebali, J Mistry, A Bateman, SR Eddy… - Nucleic acids …, 2019 - academic.oup.com
The last few years have witnessed significant changes in Pfam (https://pfam. xfam. org). The
number of families has grown substantially to a total of 17,929 in release 32.0. New …

Evolutionary history of the Hymenoptera

RS Peters, L Krogmann, C Mayer, A Donath, S Gunkel… - Current Biology, 2017 - cell.com
Summary Hymenoptera (sawflies, wasps, ants, and bees) are one of four mega-diverse
insect orders, comprising more than 153,000 described and possibly up to one million …

The Pfam protein families database: towards a more sustainable future

RD Finn, P Coggill, RY Eberhardt, SR Eddy… - Nucleic acids …, 2016 - academic.oup.com
In the last two years the Pfam database (http://pfam. xfam. org) has undergone a substantial
reorganisation to reduce the effort involved in making a release, thereby permitting more …

Database resources of the national center for biotechnology information

NR Coordinators - Nucleic acids research, 2015 - pmc.ncbi.nlm.nih.gov
The National Center for Biotechnology Information (NCBI) provides a large suite of online
resources for biological information and data, including the GenBank® nucleic acid …

The Pfam protein families database: embracing AI/ML

T Paysan-Lafosse, A Andreeva, M Blum… - Nucleic acids …, 2025 - academic.oup.com
The Pfam protein families database is a comprehensive collection of protein domains and
families used for genome annotation and protein structure and function analysis …

Using deep learning to annotate the protein universe

ML Bileschi, D Belanger, DH Bryant, T Sanderson… - Nature …, 2022 - nature.com
Understanding the relationship between amino acid sequence and protein function is a long-
standing challenge with far-reaching scientific and translational implications. State-of-the-art …

Database resources of the national center for biotechnology information

NCBI Resource Coordinators - Nucleic acids research, 2016 - academic.oup.com
Abstract The National Center for Biotechnology Information (NCBI) provides a large suite of
online resources for biological information and data, including the GenBank® nucleic acid …