EMC Avamar: a deduplication way to go
It's as good as they say. Is it?
EMC Avamar is a sourcebase global deduplication solution which enables fast, efficient backup and recovery by reducing the size of backup data at the client before it is transferred across the network and stored. Avamar’s data deduplication dramatically reduces network traffic by only sending unique blocks over LAN/WAN. Blocks that were previously stored are never backed up again.
This means huge savings in bandwidth for backup, a lot less disk storage needed on the back end, and most importantly very fast backups often as much as ten times faster.
Avamar backups can be quickly recovered in just one step eliminating the hassle of restoring full and subsequent incremental backups to reach the desired recovery point. Backup data is encrypted during transit across the network and at rest for added security.
There are three ways to define data deduplication:
File-Level File level deduplication reduce storage needs for file servers by identifying duplicate files and providing an efficient mechanism for consolidating them. The most common implementation of single instance storage is at the file level. With this method, a single change in a file results in the entire file being identified as unique. Example: If there were 3 versions of a file in a backup environment, the 3 files in their entirety are stored.
Fixed Block Fixed block deduplication is commonly employed in snapshot and replication technologies. This method breaks a file into fixed length sub-objects. However, even with small changes to the data, all fixed length segments in a dataset can change despite the fact that very little of the dataset has actually changed.
Variable Block With variable block level deduplication, a change in a file results in only the variable-sized block containing the change being identified as unique. Consequently, more data is identified as common data, and in the case of backup, there is less data to store as only the unique data is backed up. This is the method used by Avamar
Avamar provides systematic fault tolerance at several levels
To ensure system integrity, Avamar has several ways to protect your data. RAID on Disk level, RAIN on Node level, Replication to other Avamars, Checkpoints which protect the server in the event of operational failures. They provide redundancy across time. Checkpoints are a read-only snapshot of the Avamar server taken to facilitate server rollbacks. Another protection level is High Availability Uplink and Dual Switches. They provide high availability in the event of hardware failure. As you can see the whole system is protected at several levels.
The key factor is the way how Avamar deduplicate it's data.Let go into this briefly. The Avamar agent running on the backup client (avtar) traverses each directory in the backup. While installing the client en running your first backup two cache files are generated, f_cache & p_cache. the f_cache is typically used for files where the p_cache is used for database files. 'Avtar' checks the client’s file cache to see if the file has been backed up before. Files that have been previously backed up are skipped from processing. If there is no match in the file cache, sticky-byte factoring divides the file data into variable-sized chunks and compress them. Each compressed data chunk is hashed. The hash created from a data chunk is referred to as an atomic hash. Atomic hashes are combined to create Composites. Each atomic and composite hash is compared to the entries in the client’s hash cache to determine if it has been stored before. If there is no match in the hash cache, the hash cache is updated and the hash is sent to the Avamar server. If there is no match on the Avamar server, then the hash and the data corresponding to the hash are sent to the Avamar server.
With sticky-byte factoring, during backup processing, 'avtar' separates raw data into chunks or objects that vary in size between 1 byte and 64 K bytes. Data chunks average 24 KB in size.
Sticky-byte factoring will always produce the same chunk results as long as the data has not changed. Where there is a change of data since the previous backup, it locates where the data has been changed and quickly re-synchronizes the data chunking process to match data chunks created during the previous backup.
Now it all comes to backup. Avamar is ideally suited for protecting clients in VMware environments. Avamar 6.0 provides a high level of integration with VMware for backing up virtual environments. Avamar provides the flexibility of implementing a VMware backup solution in any of three ways. Avamar agents can be installed in the virtual machines for VMware guest level backups. Integration with VMware Consolidated Backup (VCB) involves installing Avamar client software on the VCB proxy server. The proxy server handles the backup processing, causing a smaller impact on the client. With Avamar release 5.0, Avamar integrates with VMware VADP to provide image level backups.
By default, Avamar automatically determines whether to perform a full or incremental backup of the storage device. The first backup (initialization) is always a full backup (level 0). Each subsequent backup is an incremental (level 1). Avamar intelligently merges any previous backup data with the new level 1 incremental with the result being a new full backup. When there are multiple Terabytes to backup from a file server, a first initialization can take hours or even days. In my example it took 38 hours to complete a initialization for 1.2 TB to backup.
It's as good as they say. Is it?
When I look back to my traditional tape backup solution backup to disk is definitely better. Backup is a lot faster. No problems anymore with backup windows. Data deduplication at the source is more efficient than target. So first check before sending it over to your Avamar server is a big advantage. This also has the advantage of a minimum of data transfer. A known minor is when you backup thru the Avamar agent and all of it's work has to be done at the client side, CPU load can be quite high. Tuning is possible, but if the CPU load disappear, I doubt. Something to work on is the interface which is in my opinion poor. Especially when you look at the Unisphere interface of all other EMC products. Another big question is, how long, before you run out of physical capacity? Thru partner channels you can use the EMC Capacity planner which give you an excelent forcast how your Avamar will fills up by time. In the beginning I was somewhat cautious. Why? That damn thing filled up faster than expected. Of course when you add clients to the Avamar server it fills quickly. That's normal behavior. After a while (when done filling avamar with clients) Avamar gave me the prediction that the server would be filled within 60 days. A few days later less than 50 days and so on. Then I got really nervous. Did I make a mistake? After looking into it nothing seems suspicion. The only thing to do was keeping an eye on it and watching retention expires. My nervousness was incorrect as you can see in the image below.
Summarized: Avamar utilizes source-based deduplication. Since Avamar is both the backup software and backup-to-disk target, it can actually deduplicate the data before it leaves the server. This means that files are broken apart and deduplicated before any backup data is sent across the network. Only the changed blocks are sent across the network to the backup-to-disk target. This results in a reduction in network traffic, the amount of data stored on disk, and also the time it takes you to backup. Avamar was originally designed as a completely tapeless solution. EMC developed a “tape-out” functionality to still give the flexibility to backup to Tape. It's Cool, It's functional, It's fast, and does what it must do. Once set, you have no worries after. That's not completely true. Of course you check your daily backups and health of the Avamar Server.
More information about Avamar you can find here http://www.emc.com/backup-and-recovery/avamar/avamar.htm