Accelerating materials discovery using artificial intelligence, high performance computing and robotics

EO Pyzer-Knapp, JW Pitera, PWJ Staar… - npj Computational …, 2022‏ - nature.com
New tools enable new ways of working, and materials science is no exception. In materials
discovery, traditional manual, serial, and human-intensive work is being augmented by …

Doclaynet: A large human-annotated dataset for document-layout segmentation

B Pfitzmann, C Auer, M Dolfi, AS Nassar… - Proceedings of the 28th …, 2022‏ - dl.acm.org
Accurate document layout analysis is a key requirement for high-quality PDF document
conversion. With the recent availability of public, large ground-truth datasets such as …

Tableformer: Table structure understanding with transformers

A Nassar, N Livathinos, M Lysak… - Proceedings of the …, 2022‏ - openaccess.thecvf.com
Tables organize valuable content in a concise and compact representation. This content is
extremely valuable for systems such as search engines, Knowledge Graph's, etc, since they …

[HTML][HTML] PDF malware detection based on optimizable decision trees

Q Abu Al-Haija, A Odeh, H Qattous - Electronics, 2022‏ - mdpi.com
Portable document format (PDF) files are one of the most universally used file types. This
has incentivized hackers to develop methods to use these normally innocent PDF files to …

An overview on the role of artificial intelligence in modern advancements of material science

M Das, TC Perez, D Shetty, P Hiremath, N Naik… - ES General, 2024‏ - espublisher.com
Artificial intelligence (AI) has become a disruptive force in many industries over the past few
decades, and the subjects of material science and engineering are no exception. This …

Skin tone analysis for representation in educational materials (star-ed) using machine learning

GA Tadesse, C Cintas, KR Varshney, P Staar… - NPJ Digital …, 2023‏ - nature.com
Images depicting dark skin tones are significantly underrepresented in the educational
materials used to teach primary care physicians and dermatologists to recognize skin …

VILA: Improving structured content extraction from scientific PDFs using visual layout groups

Z Shen, K Lo, LL Wang, B Kuehl, DS Weld… - Transactions of the …, 2022‏ - direct.mit.edu
Accurately extracting structured content from PDFs is a critical first step for NLP over
scientific papers. Recent work has improved extraction accuracy by incorporating …

Optimized table tokenization for table structure recognition

M Lysak, A Nassar, N Livathinos, C Auer… - … Conference on Document …, 2023‏ - Springer
Extracting tables from documents is a crucial task in any document conversion pipeline.
Recently, transformer-based models have demonstrated that table-structure can be …

A benchmark of pdf information extraction tools using a multi-task and multi-domain evaluation framework for academic documents

N Meuschke, A Jagdale, T Spinde, J Mitrović… - International Conference …, 2023‏ - Springer
Extracting information from academic PDF documents is crucial for numerous indexing,
retrieval, and analysis use cases. Choosing the best tool to extract specific content elements …

Feta: Towards specializing foundational models for expert task applications

A Alfassy, A Arbelle, O Halimi… - Advances in …, 2022‏ - proceedings.neurips.cc
Abstract Foundational Models (FMs) have demonstrated unprecedented capabilities
including zero-shot learning, high fidelity data synthesis, and out of domain generalization …