Using too many files in your cluster account? What to do!

After the cluster restart yesterday, I got this message indicating I'm over the limit on the number of files allowed in a home account. Previously I was allowed 100,000, now it appears that is lowered to 85,000. Oh, well.

The email looks like this:

PanActive Manager Warning: User Quota Violation Soft (files)

Date:        Wed Feb 07 01:19:29 CST 2018           
System Name: pfs.local                              
System IP:   10.71.31.203 10.71.31.100 10.71.31.102 
Version      6.3.1.a-1371318.1                      
Customer ID: 1021288                                

User Quota Violation Soft (files):  Limit reached on volume /home for Unix User:   (Id:  uid:142424) Limit = 85.00 K.

The above message applies to the following component:
    Volume: /home 

If I wanted to find the directories in my account where I'm using a lot of storage, I would do this

$ du -h --max-depth=1

to see the total disk size used by the files in /home. However, that does not help me isolate the number of files in use.

I checked with Riley at KU CRC and he gave me this magic recipe.

$ for i in $(find . -maxdepth 1 ! -path . -type d); do echo -n $i": "; (find $i -type f | wc -l); done | sort -k 2 -n

I'd seen that before, but forgotten. I can use same trick to get a proper sorting of the du output, so I can also see which folders are holding the most knowledge (er, files).

$ du --max-depth=1 | sort -n

Note I removed the human-readable flag ("-h") and I instruct the sort function to treat the first return value as a number, rather than text.

This entry was posted in Data Analysis. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *