A survey and classification of storage deduplication systems

J Paulo, J Pereira - ACM Computing Surveys (CSUR), 2014 - dl.acm.org
The automatic elimination of duplicate data in a storage system, commonly known as
deduplication, is increasingly accepted as an effective technique to reduce storage costs …

Data deduplication techniques for efficient cloud storage management: a systematic review

R Kaur, I Chana, J Bhattacharya - The Journal of Supercomputing, 2018 - Springer
The exponential growth of digital data in cloud storage systems is a critical issue presently
as a large amount of duplicate data in the storage systems exerts an extra load on it …

A comprehensive study of the past, present, and future of data deduplication

W **a, H Jiang, D Feng, F Douglis… - Proceedings of the …, 2016 - ieeexplore.ieee.org
Data deduplication, an efficient approach to data reduction, has gained increasing attention
and popularity in large-scale storage systems due to the explosive growth of digital data. It …

A study of practical deduplication

DT Meyer, WJ Bolosky - ACM Transactions on Storage (ToS), 2012 - dl.acm.org
We collected file system content data from 857 desktop computers at Microsoft over a span
of 4 weeks. We analyzed the data to determine the relative efficacy of data deduplication …

To {FUSE} or not to {FUSE}: Performance of {User-Space} file systems

BKR Vangoor, V Tarasov, E Zadok - 15th USENIX Conference on File …, 2017 - usenix.org
Traditionally, file systems were implemented as part of OS kernels. However, as complexity
of file systems grew, many new file systems began being developed in user space …

The design of fast and lightweight resemblance detection for efficient post-deduplication delta compression

W **a, L Pu, X Zou, P Shilane, S Li, H Zhang… - ACM Transactions on …, 2023 - dl.acm.org
Post-deduplication delta compression is a data reduction technique that calculates and
stores the differences of very similar but non-duplicate chunks in storage systems, which is …

BloomFlash: Bloom filter on flash-based storage

B Debnath, S Sengupta, J Li, DJ Lilja… - 2011 31st International …, 2011 - ieeexplore.ieee.org
The bloom filter is a probabilistic data structure that provides a compact representation of a
set of elements. To keep false positive probabilities low, the size of the bloom filter must be …

A general-purpose counting filter: Making every bit count

P Pandey, MA Bender, R Johnson, R Patro - Proceedings of the 2017 …, 2017 - dl.acm.org
Approximate Membership Query (AMQ) data structures, such as the Bloom filter, quotient
filter, and cuckoo filter, have found numerous applications in databases, storage systems …

Sparse indexing: Large scale, inline deduplication using sampling and locality.

M Lillibridge, K Eshghi, D Bhagwat, V Deolalikar… - Fast, 2009 - usenix.org
We present sparse indexing, a technique that uses sampling and exploits the inherent
locality within backup streams to solve for large-scale backup (eg, hundreds of terabytes) the …

Extreme binning: Scalable, parallel deduplication for chunk-based file backup

D Bhagwat, K Eshghi, DDE Long… - … on Modeling, Analysis …, 2009 - ieeexplore.ieee.org
Data deduplication is an essential and critical component of backup systems. Essential,
because it reduces storage space requirements, and critical, because the performance of …