While testing how to remove an ost from a lustre 2.5.3 file system I have come across
an unusual bug... After I deactivated the ost and unmounted it I tried using the
--replace option listed in man mkfs.lustre
After that I noticed that I could no longer write to the osts that followed the one
I'd been testing on, and having torn down and rebuilt the test file system I was
unable to get the ost to work again (this would be on a completely fresh filesystem.
While trying to debug I came across the following...
22 AT osp testL-OST0003-osc-MDT0000
Everything else on the file system is unmounted, and ost3 is not mounted on the system it
was previously mounted on (an lctl dl there shows nothing loaded).
I'm going to reboot the system, which I think should clear this issue... but I was
curious as to what the AT meant, and if it crops up in a production system how I'd go
about removing it.
I also wonder what I did that resulted in the state of AT... I set it to inactive using:
lctl conf_param testL-OST0003.osc.active=1
then used the --replace option from mkfs.lustre to reuse index 3.
When I mounted it back up everything seems functional... I was able to use that OST to
perform some benchmarking. It was only later when I'd added more osts to the test
environment that I found I couldn't write to them.
Kurt J. Strosahl
Scientific Computing Group, Thomas Jefferson National Accelerator Facility