Have you ever wondered what are the largest files in your local disk ? Well, I also did. But at the same time, I had two constraints on mind :
I didn’t want to use any third party tool to process the disk scan.
I was absolutely not going to scan it manually.
This article will show you step by step how I did it. But before we’re diving in, let me show you the final Tableau data vizualisations, which are quite satisfying !
Extension total sizes, grouped by usefulness
There are a lot of files without any extension (light blue on the left-hand side).
The ucas files from Unreal Engine archives actually make sense, as I do play Fortnite.
The vsix files are some visual code extensions. I still wonder how they came into my computer, I only use Sublime text as main editor…
I didn’t realize how big my png photos were until this chart showed it up.
Extensions with their total sizes and number of files, grouped by usefulness
On average, OS files are bigger than non-OS ones.
There are more than 150k files without any extension (I assumed they are for the OS but who knows?).
There are only 171 ucas files, which means that 1 ucas file is larger than the average.
I honestly should remove the useless 2Gb used by vsix files.
Number of files per folder depth
There are 24 levels of folders, where the first one is the disk itself C:/.
Most used directories are generally between 4th and 12th depth.
6th level don’t contain a lot of files : there must be only subdirectories in this folder depth.
Folders depths grouped by usefulness
1 dot = 1 file
1 color = 1 folder
Y axis = folder depth starting with 1, from top to bottom
The far we go down (to greater directories depth), the less are the amount of files.
Empty spaces that are created in non-OS files stand for exclusive OS folders.
Among OS files, those large lined-up areas stand for Microsoft Services files :
Among non-OS files, the large pink and green lines stands for %AppData% subfolders, where all caching processes are happening and stored :
How did I do it
Gathering files details
Before having the above final vizualisation, the first step is obviously to gather datas. I just used the following two lines code from my cmd terminal :
Note that the output file is stored out of the scanned disk so that it doesn’t interfer while scanning.
The output will look like shown below : Quite ugly, right ? Let’s do some cleaning.
This step can be done in any software or programming language that you like. In my case, I directly used Tableau Software.
I import the initial file as a text file with a random non-used character as delimiter. From this way, I can customize all new calculated fields from raw datas manually. In my case, I used ^ as the seen in this (french version) screenshot from Tableau Software Desktop :
I create all the new calculated fields and hide the single raw column src_all :
I preview final output datas to make sure everything fits to what I expected :