Hi hpdd-discuss,
We are attempted to run the 2.7.0 lustre client. We downloaded the rpms
from hpdd (built against redhat 2.6.32-504.8.1 kernel).
However we encounter "bad lustre msg magic: 00000000" for clients with
intel truescale 73xx hca's.
First we tried mounting lustre 2.1.5 servers:
Lustre: Lustre: Build Version:
2.7.0-RC4--PRISTINE-2.6.32-504.8.1.el6.x86_64
LNet: Added LNI 10.191.132.81@o2ib [8/256/0/180]
Lustre: Server MGS version (2.1.5.0) is much older than client. Consider upgrading server
(2.7.0)
LustreError: 23725:0:(pack_generic.c:662:lustre_unpack_rep_ptlrpc_body()) bad lustre msg
magic: 00000000
LustreError: 23725:0:(client.c:338:unpack_reply()) @@@ unpack ptlrpc body failed: -22
req@ffff8808e3e65980 x1510142533173264/t0(0)
o503->MGC10.191.128.11@o2ib@10.191.128.11@o2ib:26/25 lens 272/8384 e 0 to 0 dl
1440184155 ref 2 fl Rpc:R/0/0 rc 0/-22
LustreError: 23725:0:(mgc_request.c:248:do_config_log_add()) failed processing sptlrpc
log: -71
LustreError: 15c-8: MGC10.191.128.11@o2ib: The configuration from log
'scratch-client' failed (-71). This may be the result of communication errors
between this node and the MGS, a bad configuration, or other errors. See the syslog for
more information.
Next we tried building client from lustre master branch tag 2_7_58 and
latest rhel6 kernel; we receive the same error message:
Lustre: Lustre: Build Version:
v2_7_58_0--PRISTINE-2.6.32-504.16.2.el6.x86_64
LNet: Added LNI 10.191.132.81@o2ib [8/256/0/180]
Lustre: Server MGS version (2.1.5.0) is much older than client. Consider upgrading server
(2.7.58)
LustreError: 535:0:(pack_generic.c:665:lustre_unpack_rep_ptlrpc_body()) bad lustre msg
magic: 00000000
LustreError: 535:0:(client.c:407:unpack_reply()) @@@ unpack ptlrpc body failed: -22
req@ffff880494189980 x1510140514664464/t0(0)
o503->MGC10.191.128.11@o2ib@10.191.128.11@o2ib:26/25 lens 272/8384 e 0 to 0 dl
1440182267 ref 2 fl Rpc:R/0/0 rc 0/-22
LustreError: 535:0:(mgc_request.c:249:do_config_log_add()) failed processing sptlrpc log:
-71
LustreError: 15c-8: MGC10.191.128.11@o2ib: The configuration from log
'scratch-client' failed (-71). This may be the result of communication errors
between this node and the MGS, a bad configuration, or other errors. See the syslog for
more information.
Finally we tried mounting different lustre 2.5.3 servers and receive the
same "bad msg magic" error:
Lustre: Lustre: Build Version:
v2_7_58_0--PRISTINE-2.6.32-504.16.2.el6.x86_64
LNet: Added LNI 10.191.132.81@o2ib [8/256/0/180]
LustreError: 15384:0:(pack_generic.c:665:lustre_unpack_rep_ptlrpc_body()) bad lustre msg
magic: 00000000
LustreError: 15384:0:(client.c:407:unpack_reply()) @@@ unpack ptlrpc body failed: -22
req@ffff8804972d9980 x1510145715601432/t0(0)
o503->MGC10.191.128.13@o2ib@10.191.128.13@o2ib:26/25 lens 272/8384 e 0 to 0 dl
1440187190 ref 2 fl Rpc:R/0/0 rc 0/-22
LustreError: 15c-8: MGC10.191.128.13@o2ib: The configuration from log
'testfs-client' failed (-71). This may be the result of communication errors
between this node and the MGS, a bad configuration, or other errors. See the syslog for
more information.
Lustre: Unmounted testfs-client
LustreError: 15384:0:(obd_mount.c:1342:lustre_fill_super()) Unable to mount (-71)
I suspect the issue is related to RHEL6 kernel changes. Interesting we
do not have issues with clients using connectx hca's.
regards,
chris hunter
yale hpc group