Thanks Richard, I appreciate your advice.
I was able to sature the channel using: XDD, 10 threads writing in 10 OST, each OST in
difference OSS and this is the result:
ETHERNET
T Q Bytes Ops Time Rate
IOPS Latency %CPU
TARGET Average 0 1 2147483648 65536 140.156 15.322 467.59 0.0021
39.16
TARGET Average 1 1 2147483648 65536 140.785 15.254 465.50 0.0021
39.11
TARGET Average 2 1 2147483648 65536 140.559 15.278 466.25 0.0021
39.14
TARGET Average 3 1 2147483648 65536 176.141 12.192 372.07 0.0027
38.02
TARGET Average 4 1 2147483648 65536 168.234 12.765 389.55 0.0026
38.54
TARGET Average 5 1 2147483648 65536 140.823 15.250 465.38 0.0021
39.11
TARGET Average 6 1 2147483648 65536 140.183 15.319 467.50 0.0021
39.16
TARGET Average 8 1 2147483648 65536 176.432 12.172 371.45 0.0027
38.02
TARGET Average 9 1 2147483648 65536 167.944 12.787 390.23 0.0026
38.57
Combined 10 10 21474836480 655360 180.000 119.305 3640.89 0.0003
387.99
INFINIBAND
T Q Bytes Ops Time Rate
IOPS Latency %CPU
TARGET Average 0 1 2147483648 65536 9.369 229.217 6995.16 0.0001
480.40
TARGET Average 1 1 2147483648 65536 9.540 225.110 6869.80 0.0001
474.25
TARGET Average 2 1 2147483648 65536 8.963 239.582 7311.45 0.0001
479.85
TARGET Average 3 1 2147483648 65536 9.480 226.521 6912.86 0.0001
478.21
TARGET Average 4 1 2147483648 65536 9.109 235.748 7194.47 0.0001
480.83
TARGET Average 5 1 2147483648 65536 9.284 231.299 7058.69 0.0001
479.04
TARGET Average 6 1 2147483648 65536 8.839 242.947 7414.15 0.0001
480.55
TARGET Average 7 1 2147483648 65536 9.210 233.166 7115.65 0.0001
480.17
TARGET Average 8 1 2147483648 65536 9.373 229.125 6992.33 0.0001
475.13
TARGET Average 9 1 2147483648 65536 9.184 233.828 7135.86 0.0001
480.25
Combined 10 10 21474836480 655360 9.540 2251.097 68698.03
0.0000 4788.69
A estimate is 0,6Gbits (max 1Gbit) by ethernet and 16Gbits by infiniband (max 40Gbits).
REGARDS!
El 19/05/2014, a las 17:37, Mohr Jr, Richard Frank (Rick Mohr) <rmohr(a)utk.edu>
escribió:
Alfonso,
Based on my attempts to benchmark single client Lustre performance, here is some
advice/comments that I have. (YMMV)
1) On the IB client, I recommend disabling checksums (lctl set_param osc.*.checksums=0).
Having checksums enabled sometimes results in a significant performance hit.
2) Single-threaded tests (like dd) will usually bottleneck before you can max out the
total client performance. You need to use a multi-threaded tool (like xdd) and have
several threads perform IO at the same time in order to measure aggregate single client
performance.
3) When using a tool like xdd, set up the test to run for a fixed amount of time rather
than having each thread write a fixed amount of data. If all threads write a fixed amount
of data (say 1 GB), and if any of the threads run slower than others, you might get skewed
results for the aggregate throughput because of the stragglers.
4) In order to avoid contention at the ost level among the multiple threads on a single
client, precreate the output files with stripe_count=1 and statically assign them evenly
to the different osts. Have each thread write to a different file so that no two
processes write to the same ost. If you don't have enough osts to saturate the
client, you can always have two files per ost. Going beyond that will likely hurt more
than help, at least for an ldiskfs backend.
5) In my testing, I seem to get worse results using direct I/O for write tests, so I
usually just use buffered I/O. Based on my understanding, the max_dirty_mb parameter on
the client (which defaults to 32 MB) limits the amount of dirty written data than can be
cached on each ost. Unless you have increased this to a very large number, that parameter
will likely mitigate any effects of client caching on the test results. (NOTE: This
reasoning only applies to write tests. Any written data can still be cached by the
client, and a subsequent read test might very well pull data from cache unless you have
taken steps to flush the cached data.)
If you have 10 oss nodes and 20 osts in your file system, I would start by running a test
with 10 threads and have each thread write to a single ost on different servers. You can
increase/decrease the number of threads as needed to see if the aggregate performance gets
better/worse. On my clients with QDR IB, I typically see aggregate write speeds in the
range of 2.5-3.0 GB/s.
You are probably already aware of this, but just in case, make sure that the IB clients
you use for testing don't also have ethernet connections to your OSS servers. If the
client has an ethernet and an IB path to the same server, it will choose one of the paths
to use. It could end up choosing ethernet instead of IB and mess up your results.
--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu
On May 19, 2014, at 6:33 AM, "Pardo Diaz, Alfonso"
<alfonso.pardo(a)ciemat.es>
wrote:
> Hi,
>
> I have migrated my Lustre 2.2 to 2.5.1 and I have equipped my OSS/MDS and clients
with Infiniband QDR interfaces.
> I have compile lustre with OFED 3.2 and I have configured lnet module with:
>
> options lent networks=“o2ib(ib0),tcp(eth0)”
>
>
> But when I try to compare the lustre performance across Infiniband (o2ib), I get the
same performance than across ethernet (tcp):
>
> INFINIBAND TEST:
> dd if=/dev/zero of=test.dat bs=1M count=1000
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes (1,0 GB) copied, 5,88433 s, 178 MB/s
>
> ETHERNET TEST:
> dd if=/dev/zero of=test.dat bs=1M count=1000
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes (1,0 GB) copied, 5,97423 s, 154 MB/s
>
>
> And this is my scenario:
>
> - 1 MDs with SSD RAID10 MDT
> - 10 OSS with 2 OST per OSS
> - Infiniband interface in connected mode
> - Centos 6.5
> - Lustre 2.5.1
> - Striped filesystem “lfs setstripe -s 1M -c 10"
>
>
> I know my infiniband running correctly, because if I use IPERF3 between client and
servers I got 40Gb/s by infiniband and 1Gb/s by ethernet connections.
>
>
>
> Could you help me?
>
>
> Regards,
>
>
>
>
>
> Alfonso Pardo Diaz
> System Administrator / Researcher
> c/ Sola nº 1; 10200 Trujillo, ESPAÑA
> Tel: +34 927 65 93 17 Fax: +34 927 32 32 37
>
>
>
>
> ----------------------------
> Confidencialidad:
> Este mensaje y sus ficheros adjuntos se dirige exclusivamente a su destinatario y
puede contener información privilegiada o confidencial. Si no es vd. el destinatario
indicado, queda notificado de que la utilización, divulgación y/o copia sin autorización
está prohibida en virtud de la legislación vigente. Si ha recibido este mensaje por error,
le rogamos que nos lo comunique inmediatamente respondiendo al mensaje y proceda a su
destrucción.
>
> Disclaimer:
> This message and its attached files is intended exclusively for its recipients and
may contain confidential information. If you received this e-mail in error you are hereby
notified that any dissemination, copy or disclosure of this communication is strictly
prohibited and may be unlawful. In this case, please notify us by a reply and delete this
email and its contents immediately.
> ----------------------------
>
> _______________________________________________
> HPDD-discuss mailing list
> HPDD-discuss(a)lists.01.org
>
https://lists.01.org/mailman/listinfo/hpdd-discuss