All,
Turns out the problem was due to trying to initially mount the OST on
the wrong node in an HA environment. I re-ran the tune2fs --writeconf
and then mounted each OST on the correct node and it worked.
Thanks,
Brian
On 05/21/2014 02:01 PM, Brian C. Huffman wrote:
All,
We had a power failure last evening and both our MDS (combined mdt /
mgt) and OSS servers went down.
Upon power-up, the MGT and all MDTs mounted correctly. Some of the
OSTs mounted but not all.
I umounted everything and then did an e2fsck on the OSTs that didn't
mount (just a basic "e2fsck <device>"). On one of those OSTs, there
was a corrected inode:
Pass 5: Checking group summary information
Inode bitmap differences: -76225610
Fix<y>? yes
However, the OSTs still wouldn't mount and I was seeing these messages
in the log:
May 21 11:28:06 oss2 lrmd: [3351]: info: RA output:
(lustre-ost5:start:stderr) mount.lustre: mount /dev/mapper/ost_home_5
at /lustre/home/ost_home_5 failed: No such device or address The
target service failed to start (bad config log?)
(/dev/mapper/ost_home_5). See /var/log/messages.
So I then tried to umount everything and do a "tunefs.lustre
--writeconf" on each device.
Now on mount, I'm seeing the following:
May 21 14:00:19 oss1 kernel: LDISKFS-fs (dm-5): mounted filesystem
with ordered data mode
May 21 14:00:19 oss1 multipathd: dm-5: umount map (uevent)
May 21 14:00:19 oss1 kernel: JBD: barrier-based sync failed on dm-5-8
- disabling barriers
May 21 14:00:19 oss1 kernel: LDISKFS-fs (dm-5): mounted filesystem
with ordered data mode
May 21 14:00:19 oss1 kernel: Lustre: MGC172.16.11.5@o2ib: Reactivating
import
May 21 14:00:20 oss1 kernel: LustreError:
4702:0:(obd_mount.c:1156:server_start_targets()) no server named
home-OST0002 was started
May 21 14:00:20 oss1 kernel: LustreError:
4702:0:(obd_mount.c:1670:server_fill_super()) Unable to start targets: -6
May 21 14:00:20 oss1 kernel: LustreError:
4702:0:(obd_mount.c:1453:server_put_super()) no obd home-OST0002
May 21 14:00:20 oss1 kernel: LustreError:
4702:0:(ldlm_request.c:1039:ldlm_cli_cancel_req()) Got rc -108 from
cancel RPC: canceling anyway
May 21 14:00:20 oss1 kernel: LustreError:
4702:0:(ldlm_request.c:1597:ldlm_cli_cancel_list())
ldlm_cli_cancel_list: -108
May 21 14:00:20 oss1 kernel: JBD: barrier-based sync failed on dm-5-8
- disabling barriers
May 21 14:00:20 oss1 multipathd: dm-5: umount map (uevent)
May 21 14:00:20 oss1 kernel: Lustre: server umount home-OST0002 complete
May 21 14:00:20 oss1 kernel: LustreError:
4702:0:(obd_mount.c:2065:lustre_fill_super()) Unable to mount (-6)
At this point, I'm not sure what to do next. Any suggestions?
Thanks,
Brian
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss(a)lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss