While testing how to remove an ost from a lustre 2.5.3 file system I have come across an unusual bug... After I deactivated the ost and unmounted it I tried using the --replace option listed in man mkfs.lustre
After that I noticed that I could no longer write to the osts that followed the one I'd been testing on, and having torn down and rebuilt the test file system I was unable to get the ost to work again (this would be on a completely fresh filesystem. While
trying to debug I came across the following...
22 AT osp testL-OST0003-osc-MDT0000 testL-MDT0000-mdtlov_UUID 1
Everything else on the file system is unmounted, and ost3 is not mounted on the system it was previously mounted on (an lctl dl there shows nothing loaded).
I'm going to reboot the system, which I think should clear this issue... but I was curious as to what the AT meant, and if it crops up in a production system how I'd go about removing it.
I also wonder what I did that resulted in the state of AT... I set it to inactive using:
lctl conf_param testL-OST0003.osc.active=1
then used the --replace option from mkfs.lustre to reuse index 3.
When I mounted it back up everything seems functional... I was able to use that OST to perform some benchmarking. It was only later when I'd added more osts to the test environment that I found I couldn't write to them.
Kurt J. Strosahl
Scientific Computing Group, Thomas Jefferson National Accelerator Facility
HPDD-discuss mailing list