I'd still guess some kind of network connectivity problem, like firewall rules or port
blocking (need to allow connections on port 988). Also check /etc/hosts that your
hostname isn't mapped to 127.0.0.1 or something.
Cheers, Andreas
--
Andreas Dilger
Lustre Software Architect
Intel High Performance Data Division
On 2015/06/30, 3:07 PM, "Sean Caron"
<scaron@umich.edu<mailto:scaron@umich.edu>> wrote:
Hi all,
Me again :O Still working my way through server build on what's basically Lustre-devel
from Git.
I'm in Section 10 of the documentation trying to get the filesystem proper built and
it's not quite going according to plan.
Brief background: One MGS/MDT and three OSS machines. MGS/MDT at 192.168.1.100 and the
three OSS machines are at 192.168.1.101, 192.168.1.102 and 192.168.1.103.
So I go and build the combined MGS and MGT datastore on my MGS using the directions in
Section 10.1 of the administration guide:
mkfs.lustre --fsname=lustre --mgs --mdt --index=0 --reformat /dev/md0
I then go and mount it up, no problem:
mkdir -p /mdt
mount -t lustre /dev/md0 /mdt
I see stuff in dmesg on the MGS/MDT machine and it looks good:
[ 296.734221] LDISKFS-fs (md0): mounted filesystem with ordered data mode. Opts:
user_xattr,errors=remount-ro
[ 304.545375] LDISKFS-fs (md0): mounted filesystem with ordered data mode. Opts:
user_xattr,errors=remount-ro
[ 304.729341] Lustre: Lustre: Build Version:
v2_7_55_0-g7cb2e4b-CHANGED-3.10.0-229.4.2.el7.centos_lustre.x86_64
[ 305.516113] LDISKFS-fs (md0): mounted filesystem with ordered data mode. Opts:
user_xattr,errors=remount-ro,no_mbcache
[ 305.964081] Lustre: ctl-lustre-MDT0000: No data found on store. Initialize space
[ 305.992657] Lustre: lustre-MDT0000: new disk, initializing
[ 306.110046] Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0
[0x0000000200000400-0x0000000240000400):0:mdt
So that all seems okay, but then I go over to my first OSS node ... I first try to run
mkfs.lustre, that seems to complete okay:
mkfs.lustre --fsname=lustre --mgsnode=192.168.1.100@tcp0 --ost --index=1 --reformat
/dev/md2
But then if I try to actually mount that, it pauses for a moment, then gives me a timeout
error:
mkdir -p /ost1
mount -t lustre /dev/md2 /ost1
I see the following in dmesg ... Does an error -110 make any sense to anyone?
[ 1010.230310] Lustre: Lustre: Build Version:
v2_7_55_0-g7cb2e4b-CHANGED-3.10.0-229.4.2.el7.centos_lustre.x86_64
[ 1011.468508] LDISKFS-fs (md2): mounted filesystem with ordered data mode. Opts:
errors=remount-ro,no_mbcache
[ 1016.665780] Lustre: 4595:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent
has timed out for slow reply: [sent 1435697020/real 1435697020] req@ffff881f958f8000
x1505437437394948/t0(0) o250->MGC192.168.1.100@tcp@192.168.1.100@tcp:26/25 lens 520/544
e 0 to 1 dl 1435697025 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[ 1022.955966] LustreError: 15f-b: lustre-OST0001: cannot register this server with the
MGS: rc = -110. Is the MGS running?
[ 1022.956230] LustreError: 4561:0:(obd_mount_server.c:1789:server_fill_super()) Unable to
start targets: -110
[ 1022.956388] LustreError: 4561:0:(obd_mount_server.c:1504:server_put_super()) no obd
lustre-OST0001
[ 1022.956469] LustreError: 4561:0:(obd_mount_server.c:137:server_deregister_mount())
lustre-OST0001 not registered
[ 1023.216433] Lustre: server umount lustre-OST0001 complete
[ 1023.216439] LustreError: 4561:0:(obd_mount.c:1342:lustre_fill_super()) Unable to mount
(-110)
I've ensured that LNET is running ... I was sure to disable SELinux ... these are all
running on an RFC1918 subnet; common broadcast domain; there shouldn't be any
firewalling or anything in the way ... connectivity basically seems okay between the
MGS/MDT and the OSS machines. Any thoughts?
Thanks!
Sean