If recovery is aborted, any clients which did not complete the recovery process will be
evicted by the MDS server. If I remember correctly, there is a limit on the amount of
time that recovery will run. The time limit might get extended as more clients reconnect,
but if there is no activity from the clients, the whole recovery process should timeout at
some point. What does "lctl get_param mdt.*.recovery_status" show? Have any
clients completed (or even started) recovery? I don't think the recovery timeout
starts counting down until at least one client has reconnected. If there is something
preventing the clients from contacting the MDS server, maybe the server is just sitting
there indefinitely.
--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu
On May 20, 2014, at 10:13 PM, Javed Shaikh <javed.shaikh(a)anu.edu.au>
wrote:
CentOS 6.4 / Lustre 2.4.2 (both client and servers)
hi,
it looks like MDTs are not recovering after more than 12hours of being in that state.
there’s hardly any activity happening on the MDS.
what would happen if the recovery is aborted through lctl?
thanks,
javed
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss(a)lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss