Many system administrators are not aware of a very useful feature available on Windows Server 2012 and 2016.
The Data Deduplication feature can save hundreds of Mb of space in a File Server. With it, the system search for duplicate blocks in volumes where the deduplication was activated. It will then remove those duplications for you.
For end users and applications, it is totally transparent, users will see their files stored in the same way like before.
As an example let’s take a look at the following file structure:
- D:\Data\Sales\backup\ SalesReport2015Presentation.pptx
- D:\Data\Sales\OLD-Backup\ SalesReport2015Presentation.pptx
- D:\Data\Sales\2015Backup\ SalesReport2015Presentation.pptx
- D:\Data\Sales\JacksonWorkingFiles\ SalesReport2015Presentation.pptx
As you may have noticed, on volume D: a file called SalesReport2015Presentation.pptx was saved on five different locations. If each file takes 30 MB, which represents 150 Mb of disk space. These days 150 Mb may not seem like much, but if the file duplication happened in a big network with hundreds or thousands of users, and thousands or even millions of files, this represents lots of disk space.
The savings on space for VHDs files can reach more than 80% and for general file shares can represent more than 50%.
The Volume requirement to enable Deduplication is:
- Can’t be the boot or system volume, it must be a data volume
- Must be formatted by NTFS
- The volume must be smaller than 64 Tb
- Can’t be a removable disk
When enabling the deduplication, you can exclude some sort files simply adding the extension, for example; .zip, .cab, etc. Folders can be specified to be excluded as well.
Files smaller than 32kb and encrypted file by EFS (Encrypted File System) are not processed. You should configure a schedule for the files to be processed, ex Off-peak hours, otherwise Deduplication works in 1 hour cycles that runs in the background.
The minimum age for files be processed by Deduplication is 3 days, but can be changed if needed. The idea is Server 2012 will not process new files and files that are in use in for day to day operations.
Let us leave the theory beside and talk about real life experience. I the below took a screenshot from one of my File Servers.
As you can see I’m saving 597 Gb from Drive G:. This drive is an “ISO repository” we use to install different software in our Classrooms.
Drives E: and C:\genetal are shared folders where our trainers keep their files used when teaching. These folders have thousands of Word, Excel, PowerPoint, Access and images. I’m saving 597 Gb from Drive G: and 10.6 Gb from drive E:
Below is a screenshot from my Backup Server.
On this server I keep a backup from all files used in our training centre. They are VHDs for hundreds of Hyper-V Virtual Machines, images used to deploy classrooms, ISO files and Word, Excel, PowerPoint documents.
As you can see I'm saving 2.81 Tb on drive F: on this server. I have enabled Deduplication on Drive D:, E: and G: as well, but by the time I took the screenshot it hasn't processed the files yet. With a couple of clicks in two of our servers I managed to save 3 Tb so far. There are more to come!!!
The Deduplication configuration is done through Server Manager / File and Storage Services.
The screenshot bellow shows how to add the Data Deduplication feature.
The screenshot bellow shows how to enable Data Deduplication in one of volumes.
Good luck with your Data Deduplication implementation.
New Horizons Ireland Instructor
Check out our latest Windows Server Courses