On Sep 17, 2014, at 2:45 AM, "Mohr Jr, Richard Frank (Rick Mohr)"
<rmohr(a)utk.edu> wrote:
When I have had to delete large dir trees in the past, I have used
tricks like "find | xargs" to help speed things up. But on the occasions that I
did run "rm -rf", I never ran into an issue with the MDS oom'ing (although
that was on a lustre 1.8 file system, so maybe that was the difference). I think the only
time I had the MDS oom was when a user opened/closed so many files in their batch job that
the MDS memory was consumed with lock structures. Although in that case, it was easy to
identify this as the cause of the problem because /proc/slabinfo showed the lustre locks
consuming the memory. In my case with "rm -rf", I don't find anything in
/proc/slabinfo that comes close to accounting for all the buffer cache data that is
reported as used.
...
I may have to find some extra hardware to run tests on to see if I
can reproduce the issue, and then I can try an upgrade to see if it corrects the problem.
In case this is of interest to anyone, I wanted to share some results from testing I did
with our Lustre 2.4.3 file system in regards to MDS memory usage. (The more relevant
results would be from testing on Lustre 2.5.3 and 2.6. I am working on getting a test
setup for that.)
In these tests, I created directory trees populated with empty files (stripe_count=1) and
then used various methods to delete the files. Before and after each test, I ran
"echo 1 > /proc/sys/vm/drop_caches" on the MDS and recorded the
"base" buffer usage.
Test #1) On Lustre 2.5.0 client, used "rm -rf" to remove approx 700K files.
Buffer usage: before = 14.3 GB, after = 15.4 GB
Test #2) On Lustre 1.8.9 client, used "rm -rf" to remove approx 730K files.
Buffer usage: before = 15.4 GB, after = 15.8 GB
Test #3) On Lustre 1.8.9 client, used "clean_dirs.sh" script (thanks to Steve
Ayers) to remove approx 650K files. Buffer usage: before = 15.8 GB, after = 16.0 GB
Test #4) On Lustre 2.5.0 client, used "clean_dirs.sh" script to remove approx
730K files. Buffer usage: before = 16.0 GB, after = 16.25 GB
Test #5) On one Lustre 2.5.0 client and two Lustre 1.8.9 clients, used "rm -rf"
to delete approx 330K files on each host simultaneously. Buffer usage: before = 16.26 GB,
after = 17.63 GB
Test #6) Similar to test #5, but used "rm -rf" to delete files in groups of
approx 110K. The deletion of these groups was staggered in time across the three nodes.
At some points two nodes were deleting simultaneously and at other times only one node was
deleting files. Buffer usage: before = 17.63 GB, after = 17.8 GB.
Test #7) On Lustre 2.5.0 client, deleted 9 groups of 110K files each. The groups were
deleted sequentially with some pauses between groups. Buffer usage: before = 17.8 GB,
after = 17.9 GB
Test #8) On Lustre 1.8.9 client, used "find $DIRNAME -delete" to remove approx
1M files. Buffer usage: before = 17.9 GB, after = 19.4 GB
The tests showed quite a bit of variance between nodes and also the tools used to delete
the files. The lowest increase in buffer usage seemed to occur when files were deleted
sequentially in smaller batches. I don't see any consistent pattern except for the
fact that the base buffer usage always seems to increase when files are deleted in large
numbers. The setup was not completely ideal since other users were actively using the
file system at the same time. However, there are a couple of things to note:
1) Over the course of the week, the MDS base buffer usage increased from 14.3 GB to 19.4
GB. These increases only occurred during my file removal tests, and there was never any
decrease in base buffer usage at any point.
2) Other file system activity did not seem to contribute to the base buffer usage
increase. During the nights/weekend when I did not do testing, the overall buffer usage
did increase. However, when I would drop the caches to measure the base buffer usage, it
always returned to the same (or at least very nearly the same) value as it was the day
before. I also observed an application doing millions of file open/read/close operations,
and none of this increased the base buffer usage.
I am not sure exactly what is happening here, but I figured I might as well put the info
out there in case it is helpful to anyone.
--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu