Security and privacy in metaverse: A comprehensive survey

Y Huang, YJ Li, Z Cai - Big Data Mining and Analytics, 2023 - ieeexplore.ieee.org
Metaverse describes a new shape of cyberspace and has become a hot-trending word since
2021. There are many explanations about what Meterverse is and attempts to provide a …

Tabular data augmentation for machine learning: Progress and prospects of embracing generative ai

L Cui, H Li, K Chen, L Shou, G Chen - arxiv preprint arxiv:2407.21523, 2024 - arxiv.org
Machine learning (ML) on tabular data is ubiquitous, yet obtaining abundant high-quality
tabular data for model training remains a significant obstacle. Numerous works have …

Cost-based or learning-based? A hybrid query optimizer for query plan selection

X Yu, C Chai, G Li, J Liu - Proceedings of the VLDB Endowment, 2022 - dl.acm.org
Traditional cost-based optimizers are efficient and stable to generate optimal plans for
simple SQL queries, but they may not generate high-quality plans for complicated queries …

Learned data-aware image representations of line charts for similarity search

Y Luo, Y Zhou, N Tang, G Li, C Chai… - Proceedings of the ACM on …, 2023 - dl.acm.org
Finding line-chart images similar to a given line-chart image query is a common task in data
exploration and image query systems, eg finding similar trends in stock markets or medical …

Goodcore: Data-effective and data-efficient machine learning through coreset selection over incomplete data

C Chai, J Liu, N Tang, J Fan, D Miao, J Wang… - Proceedings of the …, 2023 - dl.acm.org
Given a dataset with incomplete data (eg, missing values), training a machine learning
model over the incomplete data requires two steps. First, it requires a data-effective step that …

Haichart: Human and AI paired visualization system

Y **e, Y Luo, G Li, N Tang - arxiv preprint arxiv:2406.11033, 2024 - arxiv.org
The growing importance of data visualization in business intelligence and data science
emphasizes the need for tools that can efficiently generate meaningful visualizations from …

Lakebench: A benchmark for discovering joinable and unionable tables in data lakes

Y Deng, C Chai, L Cao, Q Yuan, S Chen, Y Yu… - Proceedings of the …, 2024 - dl.acm.org
Discovering tables from poorly maintained data lakes is a significant challenge in data
management. Two key tasks are identifying joinable and unionable tables, crucial for data …

Coresets over multiple tables for feature-rich and data-efficient machine learning

J Wang, C Chai, N Tang, J Liu, G Li - Proceedings of the VLDB …, 2022 - dl.acm.org
Successful machine learning (ML) needs to learn from good data. However, one common
issue about train data for ML practitioners is the lack of good features. To mitigate this …

Optimizing data acquisition to enhance machine learning performance

T Wang, S Huang, Z Bao, JS Culpepper… - Proceedings of the …, 2024 - dl.acm.org
In this paper, we study how to acquire labeled data points from a large data pool to enrich a
training set for enhancing supervised machine learning (ML) performance. The state-of-the …

Gfs: Graph-based feature synthesis for prediction over relational databases

H Zhang, Q Gan, D Wipf, W Zhang - arxiv preprint arxiv:2312.02037, 2023 - arxiv.org
Relational databases are extensively utilized in a variety of modern information system
applications, and they always carry valuable data patterns. There are a huge number of data …