ZeroFiller (for Windows!)
This might be a relatively niche utility, but I figured I’d share it because it might help someone in the same situation.
|The ZeroFiller executable only (requires .NET 4.5.2)|
|The ZeroFiller source code (requires >= Visual Studio 2017, probably)|
I run a lot of VMs and I’m a bit obsessed with backups. I do high-level backups (meaning files, DB data, configs, etc.), as well as low-level backups of the actual VM disk images. The raw virtual disks get snapshotted on the host system, then the snapshot’s raw data is backed up using
It’s not fancy, and it basically grabs the machine in a crashed state (the same state it would be in were the VM to be destroyed/terminated/powered-off arbitrarily). Most people will tell you not to do that, but it works. Plus I have the high-level (and more frequent) backups to rely on.
What’s the Point?
The point being that when gzip’ing the disk image, it compresses not only the active data on the VM’s file system, but also all the garbage that has been de-allocated by that file system. In other words, all the churn from log files, temp files, software updates, and all the other crap that gets written (or overwritten) to a hard drive and deleted. An obvious example would be if you were to save a 50GB zip file to the VM’s drive and then delete it. That 50GB of “random-ish” (read: uncompressible) data would still exist on the raw disk image; The file system simply would have “forgotten” about it. But a low-level backup of the disk image would still have to deal with it.
One way around that — the simple way, the easy way, and the way of ZeroFiller — is to write a whole crapload of zeroes to the VM’s file system and then delete them. BTW, this also has the added benefit of destroying, to a somewhat satisfactory level, deallocated data on the virtual disk.
So that’s what ZeroFiller does. It’s a command-line Windows app…
Oh right, I should mention that this is for Windows. I don’t have many Windows VMs, but I do have some. This stuff is way easy to do in, for example, a Bash script. Let alone Perl or some other thing. Sure, it can be done in Windows PowerShell, but it’s not as fast nor as fun.
…that will nearly fill up the disk with zeroes. In fact, here’s what it prints when you start it with no options:
Fills a disk up with a binary file consisting of all zeroes, for the purpose of making the disk image compress well during backups or migrations. Usage: ZeroFiller.exe X: [options] Options: /fn:xxxxx - FileName, including optional path. (DEFAULT @ \ZeroFiller.blob) /fs:xxxM - FreeSpace to leave in MiB. /fs:xx% - FresSpace as percent of disk size. (DEFAULT option @ 11%) /bs:xxxx - Block size in MiB. (DEFAULT @ 100MiB) /nd - NoDelete, won't delete file after. This is a bad idea; Mainly for debug. /do - DeleteOnly. Only deletes the blob file. Good to run periodically in case it gets interrupted. Overrides the /nd flag if it's set.
Obviously “X:” can be replaced with any drive letter. It maybe will work with UNC paths if you change the validation RegEx, I dunno. Go nuts.
When the program is run, it creates the file, by default,
\ZeroFiller.blob. You can specify any file name and path using the
/fn option, but the path must be relative to the drive letter given as the first argument.
If that file already exists, it will simply append to that file until it reaches the specified relative size. If that file is already at (or greater than) the required size, it will just delete it. Unless, of course,
/nd flag is specified.
/fs option is really the only one you should put some thought into. Depending upon the size of the disk and the speed of the underlying storage, it could take a significant amount of time to fill the drive with zeroes. You don’t want to fill up the drive at the same time that you’re rebuilding a DB index, for example. But if you can’t avoid that, then maybe don’t fill the disk to 11% free space. Tweak it to your potential needs.
You may also want to tune
/bs to your storage, memory, and I/O. It will write the specified block in one [possibly giant] I/O call, which may be undesirable. Also it’s important to note that it will only check for disk free space between writes, so if you specify a silly block size of, e.g., 50GiB on a disk that only has 20GiB free, it will fill up your disk completely and crash your system. Though it may first simply use up all of your RAM and swap and crash your system that way. So I think 100MiB is a safe amount.
Beware of filesystem compression! I realize that you may have compression enabled on an entire volume. Obviously “[..]00000000[..]” is really compressible. I mean, you could theoretically store petabytes of zeroes in 128 bits (or less, depending). NTFS compression really isn’t that clever. (Nor would you want it to be; A file can be “accidentally” well-compressible, but change a few bytes and it may no longer be so. Suddenly a “small” file could turn into a “large” file on disk. Surprise! Oh, and it would brutalize your CPU.)
But I digress. My point is that you should definitely disable compression on the directory in which the blob file will be stored. On the other hand, having compression enabled on the host system is just fine. In fact, it would lead to very fast I/O when writing the zeroes (uh, providing your CPU is up to the task).
When I started banging this out, I was just going to create a file of XXX MiB. I realized that would be stupid. What if the system didn’t have XXX MiB available? Also, it monitors your disk’s free space as it writes, and it flushes to disk every block-size-MiBs. So it counts its own file against the free space, as well as any other files being generated. That’s important if, say, your Exchange server decides to shit out a bunch of new log files, or you have a local backup running, or your DB server re-indexes something.
You should be able to allow ZeroFiller to fill your disk very near its capacity without a problem, but keep in mind that some programs may allocate a large file in one I/O operation. If that happens, it’s possible that the disk could become 100% full. So, uh, be careful.
ZeroFiller will obviously create some decent I/O on your system, so run it during non-peak hours. Depending upon the amount of churn on your VM you’ll hopefully be able to schedule the task to run infrequently.
On most servers it’s once per month on a Sunday at 3AM-ish for me. I stagger it so it doesn’t run on two VMs on the same disk subsystem at the same time. That’s probably best practice. Running this on something like 10 VMs on the same host at the same time will probably result in high storage latency everywhere.
I included a link to the source file at the top of this post. It is shit, I get it. It’s basically ~250 lines in the
Main method, and one other small method. OMG spaghetti code!!!
Before you crap on me for doing it this way, it’s 1) a Bash script equivalent which would not be OO anyway, and 2) it doesn’t need to be more complicated than this. Yeah, if it were Java it would have to be 5,000 classes and 100,000 try/catch blocks with a wide mix of protections, an assload of interfaces, and 50 external libraries for no damn reason.
Seriously though, the vast majority of the code is just dealing with the args and printing useful messages to console (which you should redirect in a batch file to a log file, which is what I do). The actual business of writing the blob file is like 30 lines. There’s no reason to make this more complicated for its current level of functionality.
But if you want to add some functionality, clean it up, and/or fix any bugs, please let me know! I’ll be happy to add your update for download here. (You better make it open-source if you do mess with it. In fact, boom: It’s under a Creative Commons – Attribution license. I don’t even know if that applies to software, but that’s the spirit.)
Obviously I only write bug-free code. Especially in <300 lines, how could I possibly screw anything up? As impossible as it may seem, there may be a bug or something funky in here. I take no responsibility for any problems that may cause for you. All I know is that I’ve been running it on production systems for a few months, and so far haven’t had any problems.
In fact, here are some things I would add/fix if I had the inclination:
- Some better manner of writing the file, such that an expensive and manual GC call doesn’t have to be made every so often. (Note that all the flushing and object disposal is due to Windows not otherwise immediately recognizing the growth of the blob file. It’s not for no reason.)
- If the GC calls are necessary, perhaps a command-line argument to tune memory usage. Right now it calls the GC every 2GiB-ish of blob file writes.
- Better exception handling. I kinda wrapped all the good stuff in a single
try. Whoops. :)
- A configurable delay (e.g.
Thread.Sleep(ms)) between blocks so that this doesn’t hog I/O.
- A max-runtime argument to prevent this from running all day in the event of high system I/O.
- An I/O latency threshold, where the application will either terminate or slow itself down in the event of high I/O latency.
- A memory utilization threshold, in case the app gets out of hand due to a bug or mis-configuration.
- The option to create multiple files each at a max size of XXX, for… reasons.
- A test (and resultant warning) if ZeroFiller is set to write the blob file to a volume or directory that has compression enabled. And/or have ZeroFiller set the blob file’s compression attribute to “hells no”.
That’s just off the top of my head. I didn’t see a compelling need for most of those things for my use case, but now that I think about it I should probably go back and build a few of those in. Ah well.