Every now and then, we have to tidy up the HDFS, which is one of the most cumbersome things to do. You know: it means taking some time to navigate through the filesystem and deciding which files are worth keeping and which aren’t. Every single time, while we’re doing this, we keep thinking it would be really helpful to have a ‘tree‘-like utility in the HDFS cli toolset.
And that it would be amazing if this tool stated the size of each directory in the filesystem: this directory is 500 Gb. Instead of having to do this manually.
Instead of complaining that nobody had coded the tool we needed, every single time we had to tidy up the filesystem, we finally coded it ourselves. Since we believe it’s quite a useful tool, we decided to opensource it.
The output is pretty much like the unix tree command. Here’s an example:
Of course, this little tool is not a life saver, but we think it might make your day a little better :)
So, have a look at it on GitHub and tell us what you think!