I'm trying to get lustre to work on a rhel 6.6 running kernel
the compile of the client seems to go okay, but when i mount i get
"protocol error" from the mount command
the console shows (paraphrased)
lustre_unpack_rep_ptlrpc_body: bad lustre msg magic: 000000000
unpack ptrlrpc body failed
The same machine running 6.5 with a 2.4.3 client seems to work fine,
so i'm fairly certain it's just the upgrade to 6.6 that broke things
We are buliding a Lustre system.
The OS is RHEL 6.5.
The IB driver is MLNX_OFED_LINUX-2.3-2.0.1-rhel6.5-x86_64.iso
The Lustre version is: ieel-22.214.171.124
The install was fine (using the install script), but we cannot add the IB interface to the Lnet.
We got this error message below. Could somebody help us how to make it work?
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/chroma_agent/device_plugins/action_runner.py", line 177, in run
File "/usr/lib/python2.6/site-packages/chroma_agent/plugin_manager.py", line 283, in run
File "/usr/lib/python2.6/site-packages/chroma_agent/action_plugins/manage_lnet.py", line 91, in start_lnet
File "/usr/lib/python2.6/site-packages/chroma_agent/shell.py", line 140, in try_run
CommandExecutionError: Error (1) running 'lctl net up': '' 'LNET configure error 100: Network is down
ko2iblnd: disagrees about version of symbol ib_dealloc_pd
ko2iblnd: Unknown symbol ib_dealloc_pd
ko2iblnd: disagrees about version of symbol ib_fmr_pool_map_phys
ko2iblnd: Unknown symbol ib_fmr_pool_map_phys
LNetError: 30418:0:(api-ni.c:1208:lnet_startup_lndnis()) Can't load LND o2ib, module ko2iblnd, rc=256
Now that lustre 2.7 is coming up soon I like to open the discussion
on one of the directions we could go. Recently several projects have sprung
up that impact liblustreapi. During one of those discussion the idea of a new
liblustreapi was brought up. A liblustreapi 2.0 you could say. So I like to
get feel in the community about this. If people want this proposal I like to
recommend that we gradually build this new library along side the original
liblustreapi and link it when necessary to the lustre utilities. First I
like the discussion of using the LGPL license with this new library. I look
forward to the feed back.
I just discovered what appears to be working weak-modules support for Lustre 2.5.1 client modules on RHEL6. I saw our lustre filesystem was mounted on a host running 2.6.32-431.29.2.el6.x86_64 kernel but with client modules compiled for 2.6.32_431.23.3.el6.x86_64. Sure enough, the symlinks are in place under /lib/modules/<kernel-version>. I tried booting into a couple of other kernel versions with module symlinks and the lustre client worked there too. This is a pretty significant feature......when was it introduced? Is it supported?
On Sep 15, 2011, at 4:22 PM, Adeyemi Adesanya wrote:
> Hi Brian.
> I don't even see compatibility between "-274" kernels. I built and installed on 2.6.18-274.3.1.el5 but the only module that got symlinked under 2.6.18-274.el5 was libcfs.ko.
> Thanks for the info regarding RedHat and kABI.
> On Sep 15, 2011, at 4:12 PM, Brian J. Murrell wrote:
>> On 11-09-15 06:57 PM, Adesanya, Adeyemi wrote:
>>> I just dug up a message from lustre-discuss last year regarding support for weak-modules. It would be great I didn't have to rebuild the lustre-modules client RPM (lustre 1.8.6) against every new RHEL5 kernel that gets released.
>> Indeed, it would be good. You don't have this problem for SLES kernels,
>> FWIW. But for RH kernels, weak (Lustre at least) modules are not
>> possible due to RedHat only supporting a subset of the kABI for weak
>> modules and Lustre utilizes symbols outside of that subset (they call
>> the "whitelist").
>> I tried to get the whitelist updated for Lustre quite a while ago but
>> was met with silence.
>>> weak-modules reported that nearly all of the modules were incompatible with other kernels including the recent 2.6.18-274.el5, 2.6.18-238.19.1.el5, etc.
>> I'm not positive but I don't think those two kernels are intended to be
>> ABI compatible. My (rather old, so not great) understanding is that the
>> first component after the 2.6.18- identifies kernels that are supposed
>> to be binary compatible and need to be the same between two kernels to
>> ensure kABI compatibility.
>> Brian J. Murrell
>> Senior Software Engineer
>> Whamcloud, Inc.