The design of fast and lightweight resemblance detection for efficient post-deduplication delta compression

W **a, L Pu, X Zou, P Shilane, S Li, H Zhang… - ACM Transactions on …, 2023 - dl.acm.org
Post-deduplication delta compression is a data reduction technique that calculates and
stores the differences of very similar but non-duplicate chunks in storage systems, which is …

From hyper-dimensional structures to linear structures: maintaining deduplicated data's locality

X Zou, J Yuan, P Shilane, W **a, H Zhang… - ACM Transactions on …, 2022 - dl.acm.org
Data deduplication is widely used to reduce the size of backup workloads, but it has the
known disadvantage of causing poor data locality, also referred to as the fragmentation …

Building a high-performance fine-grained deduplication framework for backup storage with high deduplication ratio

X Zou, W **a, P Shilane, H Zhang, X Wang - 2022 USENIX Annual …, 2022 - usenix.org
Fine-grained deduplication, which first removes identical chunks and then eliminates
redundancies between similar but non-identical chunks (ie, delta compression), could …

The dilemma between deduplication and locality: Can both be achieved?

X Zou, J Yuan, P Shilane, W **a, H Zhang… - … USENIX conference on …, 2021 - usenix.org
Data deduplication is widely used to reduce the size of backup workloads, but it has the
known disadvantage of causing poor data locality, also referred to as the fragmentation …

{DeepSketch}: A new machine {Learning-Based} reference search technique for {Post-Deduplication} delta compression

J Park, J Kim, Y Kim, S Lee, O Mutlu - 20th USENIX Conference on File …, 2022 - usenix.org
Data reduction in storage systems is an effective solution to minimize the management cost
of a data center. To maximize data-reduction efficiency, prior works propose post …

{TiDedup}: A New Distributed Deduplication Architecture for Ceph

M Oh, S Lee, S Just, YJ Yu, DH Bae, S Weil… - 2023 USENIX Annual …, 2023 - usenix.org
This paper presents TiDedup, a new cluster-level deduplication architecture for Ceph, a
widely deployed distributed storage system. Ceph introduced a cluster-level deduplication …

Odess: Speeding up resemblance detection for redundancy elimination by fast content-defined sampling

X Zou, C Deng, W **a, P Shilane, H Tan… - 2021 IEEE 37th …, 2021 - ieeexplore.ieee.org
Multiple data reduction techniques have been investigated to lower storage costs for a wide
variety of customers. In this work, we focus on similarity-based delta compression, which …

A fragmentation-aware redundancy elimination scheme for inline backup systems

Y Zhang, W Zhu, D Feng, W Huang, N Jiang… - Future Generation …, 2024 - Elsevier
Data deduplication is a widely employed technique in backup systems to enhance storage
efficiency by eliminating duplicate chunks. Delta compression is a technique that …

Everything You Always Wanted to Know About Storage Compressibility of Pre-Trained ML Models but Were Afraid to Ask

Z Su, A Ahmed, Z Wang, A Anwar, Y Cheng - arxiv preprint arxiv …, 2024 - arxiv.org
As the number of pre-trained machine learning (ML) models is growing exponentially, data
reduction tools are not catching up. Existing data reduction techniques are not specifically …

Improving restore performance for in-line backup system combining deduplication and delta compression

Y Zhang, Y Yuan, D Feng, C Wang… - … on Parallel and …, 2020 - ieeexplore.ieee.org
Data deduplication, though being efficient in removing duplicate chunks, introduces chunk
fragmentation which decreases restore performance. Rewriting algorithms are proposed to …