User home folders are limited at 100GB and no customization is allowed. To our users who were previously limited to 20GB, that's great news. To the others who had 600GB allocations, that's disaster. Oh, well. Just one among many.
When you log in on hpc.crc.ku.edu, a system status message appears. One report is the disk usage. Here's what I see today:
Primary group: hpc_crmda Default Queue: crmda $HOME = /home/pauljohn <GB> <soft> <hard> : <files> <soft> <hard> : <path to volume> <pan_identity(name)> 65.04 85.00 100.00 : 136150 85000 100000 : /home/pauljohn uid:xxxxxx(pauljohn) $WORK = /panfs/pfs.local/work/crmda/pauljohn Filesystem Size Used Avail Use% Mounted on panfs://pfs.local/work 14T 1.6T 13T 12% /panfs/pfs.local/work/crmda/pauljohn $SCRATCH = /panfs/pfs.local/scratch/crmda/pauljohn Filesystem Size Used Avail Use% Mounted on panfs://pfs.local/scratch 55T 37T 19T 67% /panfs/pfs.local/scratch/crmda/pauljohn
In case you want to see the same output, the new cluster has a command called "mystats" which will display it again. In the terminal, run
In the output about my home folder, there is a "hard limit" at 100GB, as you can see. That is not adjustable in the current regime.
The main concern today is that I'm over the limit on the number of files. The limit is now 100,000 files but I have 136150. If I'm over the limit, I am not allowed to create new files. If I remain over the limit, the system can prevent me from doing my job.
Wait a minute. 136,150 files? WTH? Last time I checked, there were only 135,998 files and I'm sure I did not add any. Did some make babies? Do you suppose some R files found some C++ files and made an Rcpp project? (That's programmer humor. It knocks them out at conferences.)
I probably have files I don't need any more. I'm pretty sure that, for example, when I compile R, it uses tens of thousands of files. Maybe I can move that work somewhere else.
I wondered how I could find out where I have all those files. We asked and the best suggestion so far is to run the following, which sifts through all directories and counts the files.
for i in $(find . -maxdepth 1 -type d);do echo $i;find $i -type f |wc -l;done
The return shows directory names and file counts, like this:
./tmp 17365 ./work 46 ./.emacs.d 0 ./src 25519 ./texmf 1794 ./packages 5041 ./SVN 4321 ./Software 12014 ./.ccache 995 . /TMPRlib-3.3 19316
I'll have to sift through that. Clearly, there are some files I can live without. I've got about 20K files in TMPRlib, which is a building spot for R packages before I put them in the generally accessible part of the system. .ccache is the compiler cache, I can delete those files. They just get regenerated and saved to speed up C compiler jobs, but I have to make a choice there.
So far, I've obliterated the temporary build information, but I remain over the quota. I'll show the output from "mystats" so that you can see the difference:
$ mystats Primary group: hpc_crmda Default Queue: crmda $HOME = /home/pauljohn <GB> <soft> <hard> : <files> <soft> <hard> : <path to volume> <pan_identity(name)> 63.26 85.00 100.00 : 113510 85000 100000 : /home/pauljohn uid:xxxxx(pauljohn) $WORK = /panfs/pfs.local/work/crmda/pauljohn Filesystem Size Used Avail Use% Mounted on panfs://pfs.local/work 14T 1.6T 13T 12% /panfs/pfs.local/work/crmda/pauljohn $SCRATCH = /panfs/pfs.local/scratch/crmda/pauljohn Filesystem Size Used Avail Use% Mounted on panfs://pfs.local/scratch 55T 37T 19T 67% /panfs/pfs.local/scratch/crmda/pauljohn
Oh, well, I'll have to cut/move more things.
The take-aways from this post are
The CRC put in place a hard, unchangeable 100GB limit on user home directories.
There is a limit of 100,000 on the number of files that can be stored within that. Users will need to cut files to be under the limit.
One can use the find command in the shell to find out where the files are.
How to avoid the accidental buildup of files? The main issue is that compiling software (R packages) creates intermediate object files that are not needed once the work is done. It is difficult to police these files (at least it is for me).
I don't have time to write all this down now, but here is a hint. The question is where to store "temporary" files that are need to compile software or run a program, but they are not needed after that. In many programming chores, one can link the "build" folder to a faster, temporary storage device that is not in the network file system. In the past, I've usually used "/tmp/a_folder_i_create" because that is on the disk "in" the compute node. Disk access on the local disk is much faster than the network file system. Lately, I'm told it is even faster to put temporary material in "/dev/shm", but have not much experience. By a little clever planning, one can write the temporary files in a much faster memory disk that will be easily disposed of and, so far as I can see today, do not count within the file quota. This is not to be taken lightly. I've compared the time required to compile R using the network file storage against the local temporary storage. The difference is 45 minutes versus 15 minutes.