Hi,
Yesterday I tried adding
options ko2iblnd peer_credits=126 concurrent_sends=63
to our /etc/modprobe.d/lnet.conf for our all IB-connected Lustre 1.8 clients and servers.
The motivation for wanting to try it came from section VI of this CUG12 paper:
https://cug.org/proceedings/attendee_program_cug2012/includes/files/pap16...
Here are the client syslog messages which resulted:
Jul 2 18:46:02 c0a-s1 kernel: current num of QPs 0x7
Jul 2 18:46:02 c0a-s1 kernel: command failed, status bad parameter(0x3), syndrome
0x317227
Jul 2 18:46:02 c0a-s1 kernel: LustreError: 2360:0:(o2iblnd.c:808:kiblnd_create_conn())
Can't create QP: -22, send_wr: 16191, recv_wr: 130
Here is a corresponding server message:
Jul 2 18:47:30 ts-lfs-01 kernel: LustreError:
7744:0:(o2iblnd_cb.c:2529:kiblnd_rejected()) 10.13.68.116@o2ib rejected: o2iblnd no
resources
I'm not sure what to make of the above - are the values of peer_credits and
concurrent_sends that I tried to use too large? Are there other parameters which one must
change in order to set o2iblnd peer_credits and/or concurrent_sends?
Thanks,
Craig