On Tue, Sep 16, 2014 at 09:28:30AM -0400, Gary Molenkamp wrote:
Using lustre 2.5.3, 1 combined MDS/MDT, 44 OSTs. Currently
data, over 35M files.
On the weekend, our MDS server crashed due to an IO hang. After restarting the
server, we starting hitting the LU-5040 bug during recovery:
kernel BUG at fs/jbd2/transaction.c:1033!
kernel: invalid opcode: 0000 [#1] SMP
I attempted a restart of all OST and MDT mounts with abort_recov, and the
filesystem was able to mount on a client and all OSTs connected on a client. The
first access to any files or metadata caused the MDS to panic and also show
indications of LU-5392.
Is this is indicating a corrupted quota subsystem? I was trying to find a means
of rebuilding the quota records. However, "lfs quotacheck" is no longer
supported as it states "since space accounting is always enabled".
If the quotas are corrupted, how can I recover them. Likewise, how can I
recover from the two bugs mentioned above? I have some time flexibility to
resolve it, if that would assist in getting the bugs addressed and my filesystem
if you just want to work around quota bugs for a while, then you can turn
off quotas with
tune2fs -Q ^usrquota /dev/mdt
tune2fs -Q ^grpquota /dev/mdt
and turn them on again later with
tunefs.lustre --quota /dev/mdt
we did this recently on our 2.5.ish filesystem
Any assistance would be appreciated.
Gary Molenkamp SHARCNET
Systems Administrator University of Western Ontario
Compute/Calcul Canada http://www.computecanada.org
(519) 661-2111 x88429 (519) 661-4000
HPDD-discuss mailing list