I'll be on Kirkwood later today and will take a jaunt toward campus just for you.
Come visit, I'm good for a beer or more at Nick's or one of our booming brew
On Sep 16, 2014, at 9:55, steve ayer <steve.ayer(a)trd2inc.com>
as long-standing member of the open source community (sell hardware, not software!) and
coincidentally, as a multi-alum of iu, i am delighted to contribute my dirt-simple hack.
give the archway under memorial hall a pat for me...
i hope that this helps,
> On 09/16/2014 08:57 AM, Stephen Simms wrote:
> Hi Steve-
> Here at IU we just hit the rm -rf in spades because lots of bio guys here are running
Trinity which has a step that creates thousands of files in thousands of directories. We
didn't realize the extent of the removal problem until we met with the bio folks last
week. So, we are about to embark on creating a script like yours.
> Is there any chance you could share your script with the community or even just my
team? Sadly, we are currently hamstrung and can't upgrade to 2.6. It would certainly
save us some time scripting it and would give us something to offer the Trinity users in a
hurry so they could clean up their files in a timely fashion after each run.
> If you can't / won't , I completely understand the difficulties of sharing
codes with other organizations and institutions.
> Thanks very much for your time and consideration!
> Stephen Simms
> Lustre Community Representative Board Member, OpenSFS
> Manager, High Performance File Systems, Indiana University
>> On Sep 16, 2014, at 8:07, steve ayer <steve.ayer(a)trd2inc.com> wrote:
>> hi rick,
>> oh, yeah.
>> a very expensive filesystem operation, aggravated by myriad related bugs in <=
2.5.x. while still saddled with these versions i went so far as to write a little shell
script that walked the directory-tree depth first and blitzed files singly to keep from
hanging the mds.
>> the short answer is that the diagnosis of this problem will consume far more time
-- after which you will still have the problem -- than the simplest solution, which is to
upgrade your machinery to 2.6.
>>> On 09/15/2014 05:32 PM, Mohr Jr, Richard Frank (Rick Mohr) wrote:
>>> I have run into a situation where a user's "rm -rf" process
seems to cause very high buffer usage on our mds server. I verified that this process was
the cause of the issue by sending the STOP signal to the "rm" command and
noticing that the growth of the buffers on the mds server slowed to a crawl. If I then
sent the CONT signal, the buffer size would start growing again. I dropped all the caches
on the mds server and timed the growth. It increased about 17 GB in 15 mins. I have been
dropping the caches periodically in an effort to contain the growth while I investigate
the problem. Unfortunately, after I drop the caches, the new low point is always a little
higher than it was before, which means there will come a point where dropping the caches
will no longer be effective.
>>> With that in mind, I have had to stop the user's "rm" process
to contain the damage. Since that is only a temporary band-aid, I am trying to get a
handle on what might be the underlying problem. I searched the lustre bugs, and the
closest thing I could find was LU-4740 (and maybe LU-4906?), but it's not clear if
those are the cause of my problem. The odd thing is that a recursive rm is not an
uncommon command to run, and I have not noticed this behavior before.
>>> The server is running:
>>> - CentOS 6.5
>>> - kernel 2.6.32-358.23.2
>>> - lustre 2.4.3
>>> The client is running:
>>> - CentOS 6.2
>>> - kernel 2.6.32-358.23.2
>>> - lustre 1.8.9-wc1
>>> Has anyone else seen a similar issue?
>> HPDD-discuss mailing list