All,
I'm encountering some odd behavior with my lustre file system. It looks like the
MDT is briefly losing contact with OSTs (this does not seem confined to a specific OSS,
and I've found some OSSs that have OSTs that don't seem to be
"flickering"). The server mounting the MDT (and the MDS) is showing an
excpetionally high load, and the lustre file system itself still appears responsive to
clients (I can write to it, lfs df / df seem to be working properly). A further piece of
the puzzle is that several OSTs that were active are now showing as inactive from the mdt
(via the lctl dl command).
I'm running lustre 1.8.8 on all of the lustre file servers with Centos 5.3 / Centos
5.5 (problem occurs on 5.3 and 5.5 servers). We are also using an IB network to push the
data around.
Thank you,
Kurt J. Strosahl
System Administrator
Scientific Computing Group, Thomas Jefferson National Accelerator Facility
Show replies by date