The Lustre ChangeLog is designed to handle outages like this, so it should be up-to-date even if RBH was offline for a few days.

 

There also shouldn't be any problems to query all 39 OSTs at one time with "lfs find" - I believe the array holding the list of OSTs to match is dynamically sized.

 

You could always run both, and then compare the RBH results against "lfs find".  They should be largely the same, but may have minor differences for files added/deleted during the scan/query.

 

Cheers, Andreas

-- 

Andreas Dilger

Lustre Principal Architect

Intel High Performance Data Division

 

On 2016/06/07, 09:29, "Kumar, Amit" <ahkumar@mail.smu.edu> wrote:

 

Hi Andreas,

 

Thank you for your response.

 

I am using “lfs find” to empty out OSTs at once. I have 39 of them to empty at once.

 

Based on what you say here, I might benefit running “lfs find” on multiple OST’s at once. I will give this a try. On the other hand would it be an overkill to include all 39 OSTs in “lfs find” ?

 

As far as RBH I just found out(from a direct reply to my message on the forum) yesterday that I could query against it and is much quicker. So I have been using this since yesterday. What I am not sure of is if my RBH is accurate since I had it taken down couple of times for maintenance for few hours in the last few months. I will look if there is a way I can get RBH to rescan the entire file system and get up-to-date. This will solve my problems to a greater extent.

 

Best Regards,

Amit

 

From: Dilger, Andreas [mailto:andreas.dilger@intel.com]
Sent: Monday, June 6, 2016 11:24 PM
To: Kumar, Amit <ahkumar@mail.smu.edu>
Cc: hpdd-discuss@lists.01.org
Subject: Re: [HPDD-discuss] lfs find tips or tricks

 

You don't really explain what you are using the "lfs find" data for, so it is hard to help you optimize your usage.  It is possible, for example, to specify multiple OSTs at once for "lfs find" (e.g. if emptying 4 OSTs at once), but that may not be what you want to do.

 

As for the MDS memory problem, that is caused by huge inode/DLM lock caches on the clients, and was fixed at one point.  Don't know the bug number off hand, but you could find it in Jira.  As a workaround you can also periodically flush the lock caches on the clients via:

 

  lctl set_param ldlm.namespaces.*mdc*.lru_size=clear

 

You could avoid all of the repeated scanning by using Robin Hood to index the filesystem once, and then do queries against the RBH database, and use the Lustre ChangeLog to keep RBH updated without the need to re-scan the whole filesystem.

 

Cheers, Andreas

-- 

Andreas Dilger

Lustre Principal Architect

Intel High Performance Data Division

 

On 2016/06/06, 08:58, " Kumar, Amit" <ahkumar@mail.smu.edu> wrote:

 

Dear All,

 

I believe there is no answer to optimized “lfs find”  but still trying to see if I can learn more if there is.

 

Q1) I have been trying to scan 39 OST’s using lfs find and this is taking forever? Are there any tips or tricks to speed this up. Scans are taking anywhere between 15-24 hours per OST’s to finish if all goes well without interruption. I am parallelizing my scan’s from multiple clients to speed this up but don’t know of any alternate ways.

 

Q2) On the other hand when I start “lfs find” on each of the 39 OST’s I have doomed my MDS server with kernel panic due to out of memory issue.  Any tips on how can I minimize this load and avoid MDS from running out of memory?

 

Q3) Situation: When a client dies for any reason or if the “lfs find” command that it is running times out, with Input/Out error “or” transport shutdown(Never saw this until I started running multiple lfs find and simultaneously running lfs_migrate for the files that were identified to be moved out of OST’s)

Observation: I believe the MDS continues to run and serve the scan request for “lfs find on a thread until it fails or notices that it has evicted the client, hence taking up the resources on MDS. I don’t know if this makes sense but I am guessing this is causing my MDS to be loaded with RPC requests and causing further slowdown. Any tunable options here?

 

I wish there was some kind of indexing we could do to avoid deep scans.

 

Best Regards,

Amit