You don't really explain what you are using the "lfs find" data for, so it
is hard to help you optimize your usage. It is possible, for example, to specify multiple
OSTs at once for "lfs find" (e.g. if emptying 4 OSTs at once), but that may not
be what you want to do.
As for the MDS memory problem, that is caused by huge inode/DLM lock caches on the
clients, and was fixed at one point. Don't know the bug number off hand, but you
could find it in Jira. As a workaround you can also periodically flush the lock caches on
the clients via:
lctl set_param ldlm.namespaces.*mdc*.lru_size=clear
You could avoid all of the repeated scanning by using Robin Hood to index the filesystem
once, and then do queries against the RBH database, and use the Lustre ChangeLog to keep
RBH updated without the need to re-scan the whole filesystem.
Lustre Principal Architect
Intel High Performance Data Division
On 2016/06/06, 08:58, " Kumar, Amit"
I believe there is no answer to optimized “lfs find” but still trying to see if I can
learn more if there is.
Q1) I have been trying to scan 39 OST’s using lfs find and this is taking forever? Are
there any tips or tricks to speed this up. Scans are taking anywhere between 15-24 hours
per OST’s to finish if all goes well without interruption. I am parallelizing my scan’s
from multiple clients to speed this up but don’t know of any alternate ways.
Q2) On the other hand when I start “lfs find” on each of the 39 OST’s I have doomed my MDS
server with kernel panic due to out of memory issue. Any tips on how can I minimize this
load and avoid MDS from running out of memory?
Q3) Situation: When a client dies for any reason or if the “lfs find” command that it is
running times out, with Input/Out error “or” transport shutdown(Never saw this until I
started running multiple lfs find and simultaneously running lfs_migrate for the files
that were identified to be moved out of OST’s)
Observation: I believe the MDS continues to run and serve the scan request for “lfs find
on a thread until it fails or notices that it has evicted the client, hence taking up the
resources on MDS. I don’t know if this makes sense but I am guessing this is causing my
MDS to be loaded with RPC requests and causing further slowdown. Any tunable options
I wish there was some kind of indexing we could do to avoid deep scans.