Data Deduplication

The Data Authority RBV improves the efficiency of its synthetic-full backup by applying Deduplication technologies on two levels: File-Level and Block-Level.

File-Level Deduplication:

To achieve file-level Deduplication, the RBV generates a collective summary of the files that are saved on both the source device and the Vault.

If multiple computers are backed up to the same data repository, they will frequently share common files including corporate literature, user files that have been shared with others, and even system files. Traditional solutions must save a separate copy of the file for each computer. Instead, the RBV is able to recognize that files are duplicated on multiple systems and save each shared system file only one time — regardless of the number of computers sharing the file. In addition to saved time and bandwidth, this approach can yield substantial savings in storage space.

Technically, the synthetic full backup uses file-level Deduplication between points-in-time. If a file has not changed on the source device, and is already present on the Vault, it is not re-transmitted; instead, a “pointer” is added on the Vault. The efficiency of this process is enhanced by leveraging the built-in journaling capabilities of Windows XP, Vista, Windows 7, 2003, and 2008 systems. This type of journaling tracks file changes on the system and automatically collects data about those changes. In turn, the RBV simply reads the compiled journal data to identify changed files. Without this automatic journaling, a backup solution would need to spend significant time and resources scanning the entire file system to glean the same information.

Block-Level Deduplication:

To optimize the transmission of changed files, the RBV utilizes block-level Deduplication technologies.

In this scenario, the RBV maintains a fingerprint of the previous version of the file, knowing that a copy of this data exists on the appliance. The new version of the file is compared to the Fingerprint to determine whether sections of the file are new, unchanged, or simply shifted within the file. The changed elements are combined to make a Delta, and this new data must be transmitted to the Vault. The unchanged sections are not transmitted to the appliance; instead, the Client sends a list of pointers that refer to the data that is already stored on the Vault.

During a restore, this Delta is combined with the data indicated by the pointers to reconstruct the full file. This permits the RBV to reconstruct the desired file while transmitting and storing a substantially smaller amount of data. In large Exchange and SQL environments, Deltas are frequently 2% to 10% of the original file size.