Data-driven materials research enabled by natural language processing and information extraction

EA Olivetti, JM Cole, E Kim, O Kononova… - Applied Physics …, 2020 - pubs.aip.org
Given the emergence of data science and machine learning throughout all aspects of
society, but particularly in the scientific domain, there is increased importance placed on …

[HTML][HTML] Opportunities and challenges of text mining in materials research

O Kononova, T He, H Huo, A Trewartha, EA Olivetti… - Iscience, 2021 - cell.com
Research publications are the major repository of scientific knowledge. However, their
unstructured and highly heterogenous format creates a significant obstacle to large-scale …

MatSciBERT: A materials domain language model for text mining and information extraction

T Gupta, M Zaki, NMA Krishnan, Mausam - npj Computational Materials, 2022 - nature.com
A large amount of materials science knowledge is generated and stored as text published in
peer-reviewed scientific literature. While recent developments in natural language …

Quantifying the advantage of domain-specific pre-training on named entity recognition tasks in materials science

A Trewartha, N Walker, H Huo, S Lee, K Cruse… - Patterns, 2022 - cell.com
A bottleneck in efficiently connecting new materials discoveries to established literature has
arisen due to an increase in publications. This problem may be addressed by using named …

An analysis of simple data augmentation for named entity recognition

X Dai, H Adel - arxiv preprint arxiv:2010.11683, 2020 - arxiv.org
Simple yet effective data augmentation techniques have been proposed for sentence-level
and sentence-pair natural language processing tasks. Inspired by these efforts, we design …

Sequential sentence classification in research papers using cross-domain multi-task learning

A Brack, E Entrup, M Stamatakis… - International Journal on …, 2024 - Springer
The automatic semantic structuring of scientific text allows for more efficient reading of
research articles and is an important indexing step for academic search engines. Sequential …

MatSci-NLP: Evaluating scientific language models on materials science language tasks using text-to-schema modeling

Y Song, S Miret, B Liu - arxiv preprint arxiv:2305.08264, 2023 - arxiv.org
We present MatSci-NLP, a natural language benchmark for evaluating the performance of
natural language processing (NLP) models on materials science text. We construct the …

Reconstructing the materials tetrahedron: challenges in materials information extraction

K Hira, M Zaki, D Sheth, NMA Krishnan - Digital Discovery, 2024 - pubs.rsc.org
The discovery of new materials has a documented history of propelling human progress for
centuries and more. The behaviour of a material is a function of its composition, structure …

Deep learning for molecules and materials

AD White - Living journal of computational molecular science, 2022 - pmc.ncbi.nlm.nih.gov
Deep learning is becoming a standard tool in chemistry and materials science. Although
there are learning materials available for deep learning, none cover the applications in …

SsciBERT: A pre-trained language model for social science texts

S Shen, J Liu, L Lin, Y Huang, L Zhang, C Liu, Y Feng… - Scientometrics, 2023 - Springer
The academic literature of social sciences records human civilization and studies human
social problems. With its large-scale growth, the ways to quickly find existing research on …