hmmm shouldn't
mkfs.lustre --mgs --reformat $mgs_dev
be
mkfs.lustre --mgs --reformat --failnode=$mgs_sec_nid $mgs_dev
and you should probably mount and unmount on both servers
On 08/27/2013 11:28 PM, Swapnil Pimpale wrote:
Hi All,
We are facing the following issue. There is a JIRA ticket opened for
the same (
https://jira.hpdd.intel.com/browse/LU-3829)
The description from the bug is as follows:
If multiple --mgsnode arguments are provided to mkfs.lustre while
formatting an MDT, then the mount of this MDT fails on the MDS where
the MGS is not running.
Reproduction Steps:
Step 1) On MDS0, run the following script:
mgs_dev='/dev/mapper/vg_v-mgs'
mds0_dev='/dev/mapper/vg_v-mdt'
mgs_pri_nid='10.10.11.210@tcp1'
mgs_sec_nid='10.10.11.211@tcp1'
mkfs.lustre --mgs --reformat $mgs_dev
mkfs.lustre --mgsnode=$mgs_pri_nid --mgsnode=$mgs_sec_nid
--failnode=$mgs_sec_nid --reformat --fsname=v --mdt --index=0 $mds0_dev
mount -t lustre $mgs_dev /lustre/mgs/
mount -t lustre $mds0_dev /lustre/v/mdt
So the MGS and MDT0 will be mounted on MDS0.
Step 2.1) On MDS1:
mdt1_dev='/dev/mapper/vg_mdt1_v-mdt1'
mdt2_dev='/dev/mapper/vg_mdt2_v-mdt2'
mgs_pri_nid='10.10.11.210@tcp1'
mgs_sec_nid='10.10.11.211@tcp1'
mkfs.lustre --mgsnode=$mgs_pri_nid --mgsnode=$mgs_sec_nid
--failnode=$mgs_pri_nid --reformat --fsname=v --mdt --index=1
$mdt1_dev # Does not mount.
mount -t lustre $mdt1_dev /lustre/v/mdt1
The mount of MDT1 will fail with the following error:
mount.lustre: mount /dev/mapper/vg_mdt1_v-mdt1 at /lustre/v/mdt1
failed: Input/output error
Is the MGS running?
These are messages from Lustre logs while trying to mount MDT1:
LDISKFS-fs (dm-20): mounted filesystem with ordered data mode.
quota=on. Opts:
LDISKFS-fs (dm-20): mounted filesystem with ordered data mode.
quota=on. Opts:
LDISKFS-fs (dm-20): mounted filesystem with ordered data mode.
quota=on. Opts:
Lustre: 7564:0:(client.c:1896:ptlrpc_expire_one_request()) @@@ Request
sent has timed out for slow reply:[sent 1377197751/real
1377197751]req@ffff880027956c00 x1444089351391184/t0(0)
o250->MGC10.10.11.210@tcp1@0@lo:26/25 lens 400/544 e 0 to 1 dl
1377197756 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
LustreError: 8059:0:(client.c:1080:ptlrpc_import_delay_req()) @@@ send
limit expired req@ffff880027956800 x1444089351391188/t0(0)
o253->MGC10.10.11.210@tcp1@0@lo:26/25 lens 4768/4768 e 0 to 0 dl 0 ref
2 fl Rpc:W/0/ffffffff rc 0/-1
LustreError: 15f-b: v-MDT0001: cannot register this server with the
MGS: rc = -5. Is the MGS running?
LustreError: 8059:0:(obd_mount_server.c:1732:server_fill_super())
Unable to start targets: -5
LustreError: 8059:0:(obd_mount_server.c:848:lustre_disconnect_lwp())
v-MDT0000-lwp-MDT0001: Can't end config log v-client.
LustreError: 8059:0:(obd_mount_server.c:1426:server_put_super())
v-MDT0001: failed to disconnect lwp. (rc=-2)
LustreError: 8059:0:(obd_mount_server.c:1456:server_put_super()) no
obd v-MDT0001
LustreError: 8059:0:(obd_mount_server.c:137:server_deregister_mount())
v-MDT0001 not registered
Lustre: server umount v-MDT0001 complete
LustreError: 8059:0:(obd_mount.c:1277:lustre_fill_super()) Unable to
mount (-5)
Step 2.2) On MDS1:
mdt1_dev='/dev/mapper/vg_mdt1_v-mdt1'
mdt2_dev='/dev/mapper/vg_mdt2_v-mdt2'
mgs_pri_nid='10.10.11.210@tcp1'
mgs_sec_nid='10.10.11.211@tcp1'
mkfs.lustre --mgsnode=$mgs_pri_nid --failnode=$mgs_pri_nid --reformat
--fsname=v --mdt --index=1 $mdt1_dev
mount -t lustre $mdt1_dev /lustre/v/mdt1
With this MDT1 will mount successfully. The only difference is that
second "--mgsnode" is not provided during mkfs.lustre.
Step 3: On MDS1 again:
mkfs.lustre --mgsnode=$mgs_pri_nid --mgsnode=$mgs_sec_nid
--failnode=$mgs_pri_nid --reformat --fsname=v --mdt --index=2 $mdt2_dev
mount -t lustre $mdt2_dev /lustre/v/mdt2
Once MDT1 is mounted, then using a second "--mgsnode" option works
without any errors and mount of MDT2 succeeds.
Lustre Versions: Reproducible on 2.4.0 and 2.4.91 versions.
Conclusion: Due to this bug, MDTs do not mount on MDSs that are not
running the MGS. With the workaround, HA will not be properly configured.
Also note that this issue is not related to DNE. Same issue and
"workaround" applies to an MDT of a different filesystem on MDS1 as well.
My initial thoughts on this are as follows:
In the above case, while mounting an MDT on MDS1 one of the mgsnode is
MDS1 itself.
It looks like ptlrpc_uuid_to_peer() calculates the distance to NIDs
using LNetDist() and chooses the one with the least distance. (which
in this case turns out to be MDS1 itself which does not have a running
MGS)
Removing MDS1 from mgsnode and adding a different node worked for me.
I would really appreciate any inputs on this.
Thanks!
--
Swapnil
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss(a)lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss
--
Brian O'Connor
-------------------------------------------------------------