Re-thinking data strategy and integration for artificial intelligence: concepts, opportunities, and challenges

A Aldoseri, KN Al-Khalifa, AM Hamouda - Applied Sciences, 2023 - mdpi.com
The use of artificial intelligence (AI) is becoming more prevalent across industries such as
healthcare, finance, and transportation. Artificial intelligence is based on the analysis of …

Data-centric artificial intelligence: A survey

D Zha, ZP Bhat, KH Lai, F Yang, Z Jiang… - ACM Computing …, 2025 - dl.acm.org
Artificial Intelligence (AI) is making a profound impact in almost every domain. A vital enabler
of its great success is the availability of abundant and high-quality data for building machine …

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Fingpt: Democratizing internet-scale data for financial large language models

XY Liu, G Wang, H Yang, D Zha - arxiv preprint arxiv:2307.10485, 2023 - arxiv.org
Large language models (LLMs) have demonstrated remarkable proficiency in
understanding and generating human-like texts, which may potentially revolutionize the …

[PDF][PDF] Machine psychology: Investigating emergent capabilities and behavior in large language models using psychological methods

T Hagendorff - arxiv preprint arxiv:2303.13988, 2023 - cybershafarat.com
Large language models (LLMs) are currently at the forefront of intertwining AI systems with
human communication and everyday life. Due to rapid technological advances and their …

Data-centric ai: Perspectives and challenges

D Zha, ZP Bhat, KH Lai, F Yang, X Hu - Proceedings of the 2023 SIAM …, 2023 - SIAM
The role of data in building AI systems has recently been significantly magnified by the
emerging concept of data-centric AI (DCAI), which advocates a fundamental shift from model …

Data‐Driven Design for Metamaterials and Multiscale Systems: A Review

D Lee, W Chen, L Wang, YC Chan… - Advanced …, 2024 - Wiley Online Library
Metamaterials are artificial materials designed to exhibit effective material parameters that
go beyond those found in nature. Composed of unit cells with rich designability that are …

The METRIC-framework for assessing data quality for trustworthy AI in medicine: a systematic review

D Schwabe, K Becker, M Seyferth, A Klaß… - NPJ Digital …, 2024 - nature.com
The adoption of machine learning (ML) and, more specifically, deep learning (DL)
applications into all major areas of our lives is underway. The development of trustworthy AI …

[PDF][PDF] Findings of the BabyLM Challenge: Sample-efficient pretraining on developmentally plausible corpora

A Warstadt, A Mueller, L Choshen… - … of the BabyLM …, 2023 - research-collection.ethz.ch
Children can acquire language from less than 100 million words of input. Large language
models are far less data-efficient: they typically require 3 or 4 orders of magnitude more data …

Opendataval: a unified benchmark for data valuation

K Jiang, W Liang, JY Zou… - Advances in Neural …, 2023 - proceedings.neurips.cc
Assessing the quality and impact of individual data points is critical for improving model
performance and mitigating undesirable biases within the training dataset. Several data …