Title: Introduction to Storage Deduplication for the SQL Server DBA
1Introduction to Storage Deduplication for the SQL
Server DBASQLDBApros
SQL Server DBA Professionals
2Introduction to deduplication
- SQL Server DBAs across the industry are
increasingly facing requests to place database
backups on deduplication storage while also
considering whether or not compressing backups
while using deduplication storage is a good idea.
3Let us explain
- Deduplication is not a new term it has been
circulating widely the past few years as major
companies began releasing deduplicating storage
devices. Deduplication simply means not having to
save the same data repeatedly.
4Imagine this
- Imagine the original data as separate files the
same files multiply as multiple users save them
to their home directories, creating an excess of
duplicate files that contain the same
information. The object of deduplication is to
store unique information once, rather than
repeatedly. - In this example, when a copy of a file is already
saved, subsequent copies simply point to the
original data, rather than saving the same
information again and again.
Download Your Free Trial of our Backup
Compression Tool
5Chunking
- Files enter the process intact, as they would in
any storage. They are then deduplicated, and
compressed. - On many appliances, data is processed in real
time. Unlike the file example above,
deduplication appliances are more sophisticated. - Most deduplication appliances offer an
architecture wherein incoming data is
deduplicated inline (before it hits the disk) and
then the unique data is compressed and stored. - Sophisticated algorithms break up files into
smaller bits. This is called chunking. Most
chunking algorithms offer variable block
processing and sliding windows that allow for
changes within files without much loss in
deduplication.
Download Your Free Trial of our Backup
Compression Tool
6Fingerprints
- For instance, if a one-line change is made in one
of several nearly identical files, sliding
windows and variable block sizes break up the
larger file into smaller ones and effectively
store the small pieces of changed information
without thinking the one line of change means the
entire file is new. - Each chunk of data gets hashed, which you can
think of as a fingerprint. If the system
encounters a piece of data bearing a fingerprint
it recognizes, it merely updates the file map and
reference count without having to save that data.
- Unique data is saved and compressed.
Download Your Free Trial of our Backup
Compression Tool
7Reduce storage needs
- Both target and source-side (at the server
itself) deduplication help to reduce demand for
storage by eliminating redundancies in data and
reducing the amount of data being sent across the
network. - In source-side deduplication, vendor APIs can
quickly query a storage device to see if the
chunks of data already reside on the storage
rather than sending all of the bits across the
network for the storage device to process.
Download Your Free Trial of our Backup
Compression Tool
8Replication
- Replication is another key feature.
- Replication features found in todays
deduplication appliances are a boon for DBAs. - It allows them to ensure that backup data from
one data center can be easily replicated to
another by moving only the needed deduplicated
and compressed chunks.
Download Your Free Trial of our Backup
Compression Tool
9Rehydration
- Deduplication does mean that the file will
eventually have to be rehydrated (put back
together) if its needed. - Bits are read, decompressed, and reassembled.
- Read speed may slow down compared to
non-deduplicated storage because of needed
decompression, reassembly, and transmission over
the network. - Additionally, fragmentation on the storage device
can cause additional rehydration overhead.
Download Your Free Trial of our Backup
Compression Tool
10Learn MoreBackup Compression Tool Free
TrialDownload Compression WhitepaperFollow us
on Twitter _at_SQLDBApros