Hi all,

Me again :O Still working my way through server build on what's basically Lustre-devel from Git.

I'm in Section 10 of the documentation trying to get the filesystem proper built and it's not quite going according to plan.

Brief background: One MGS/MDT and three OSS machines. MGS/MDT at 192.168.1.100 and the three OSS machines are at 192.168.1.101, 192.168.1.102 and 192.168.1.103.

So I go and build the combined MGS and MGT datastore on my MGS using the directions in Section 10.1 of the administration guide:

mkfs.lustre --fsname=lustre --mgs --mdt --index=0 --reformat /dev/md0

I then go and mount it up, no problem:

mkdir -p /mdt
mount -t lustre /dev/md0 /mdt

I see stuff in dmesg on the MGS/MDT machine and it looks good:

[  296.734221] LDISKFS-fs (md0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro
[  304.545375] LDISKFS-fs (md0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro
[  304.729341] Lustre: Lustre: Build Version: v2_7_55_0-g7cb2e4b-CHANGED-3.10.0-229.4.2.el7.centos_lustre.x86_64
[  305.516113] LDISKFS-fs (md0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache
[  305.964081] Lustre: ctl-lustre-MDT0000: No data found on store. Initialize space
[  305.992657] Lustre: lustre-MDT0000: new disk, initializing
[  306.110046] Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400):0:mdt

So that all seems okay, but then I go over to my first OSS node ... I first try to run mkfs.lustre, that seems to complete okay:

mkfs.lustre --fsname=lustre --mgsnode=192.168.1.100@tcp0 --ost --index=1 --reformat /dev/md2

But then if I try to actually mount that, it pauses for a moment, then gives me a timeout error:

mkdir -p /ost1
mount -t lustre /dev/md2 /ost1

I see the following in dmesg ... Does an error -110 make any sense to anyone?

[ 1010.230310] Lustre: Lustre: Build Version: v2_7_55_0-g7cb2e4b-CHANGED-3.10.0-229.4.2.el7.centos_lustre.x86_64
[ 1011.468508] LDISKFS-fs (md2): mounted filesystem with ordered data mode. Opts: errors=remount-ro,no_mbcache
[ 1016.665780] Lustre: 4595:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1435697020/real 1435697020]  req@ffff881f958f8000 x1505437437394948/t0(0) o250->MGC192.168.1.100@tcp@192.168.1.100@tcp:26/25 lens 520/544 e 0 to 1 dl 1435697025 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[ 1022.955966] LustreError: 15f-b: lustre-OST0001: cannot register this server with the MGS: rc = -110. Is the MGS running?
[ 1022.956230] LustreError: 4561:0:(obd_mount_server.c:1789:server_fill_super()) Unable to start targets: -110
[ 1022.956388] LustreError: 4561:0:(obd_mount_server.c:1504:server_put_super()) no obd lustre-OST0001
[ 1022.956469] LustreError: 4561:0:(obd_mount_server.c:137:server_deregister_mount()) lustre-OST0001 not registered
[ 1023.216433] Lustre: server umount lustre-OST0001 complete
[ 1023.216439] LustreError: 4561:0:(obd_mount.c:1342:lustre_fill_super()) Unable to mount  (-110)

I've ensured that LNET is running ... I was sure to disable SELinux ... these are all running on an RFC1918 subnet; common broadcast domain; there shouldn't be any firewalling or anything in the way ... connectivity basically seems okay between the MGS/MDT and the OSS machines. Any thoughts?

Thanks!

Sean