Lustre performance related question
by Suresh Shelvapille
Folks:
I am new to Lustre. I have Lustre 2.3.0 installed on Centos-6.3(2.6.32-279 kernel) on two nodes.
These nodes are connected via 40Gbps IB interfaces.
Node-1 acting as the Lustre server runs one MGS/MDT (on /dev/sda3) and one OST (on /dev/sda4).
Both /dev/sda3 and sda4 are on a regular scsi disks with 256GB capacity (6Gbps disk speed).
Node-2 is the lustre client.
When I run "dd if=/dev/zero of=/mnt-point-lustre bs=30M count=1"
I get about 550MB/s, which is reasonable. Now, if I change the block-size, count combintaion to
anything less or more than 30M then the performance drops considerably.
What is so magical about 30Megabytes? what parameters can I tune?
Preferably I would like to use bigger file sizes such 1G, 5G, 10G and beyond....
Is it even possible with single OST/MDT combination to get better throughput for bigger files
or I need multiple OSTs etc...
Many thanks in advance for your help.
Suri
9 years, 5 months
Re: [HPDD-discuss] [Lustre-devel] [wc-discuss] Important changes to libcfs primitives usage.
by Dilger, Andreas
On 12/19/12 5:32 AM, "Alexey Lyahkov" <alexey_lyashkov(a)xyratex.com> wrote:
>That is second question,
>how we plan maintain a stable version to build with older kernels from
>RHEL, SuSe and may be Debian.
>how WC have plan to sync changes between stable tree and development in
>kernel?
This will need some extra effort, but needs to happen. That is why we are
now
working to clean up the current Lustre code to be acceptable to the
upstream
kernel. If the out-of-kernel Lustre code is wildly different from the
in-kernel
Lustre code, then it will be far too much effort to keep the two in sync.
By changing the compatibility macros to match the upstream kernel, cfs_*
wrappers,
split the client/server code, etc, we can make the out-of-tree code match
the
in-kernel code very closely, then porting patches back and forth will be
much
less work.
>what they will planed in case huge API changes? as example that change -
>splice API was added in 2.6.20 kernel and sendfile API was killed - so we
>should lost all kernels with sendfile support (RHEL5, SLES9/10).
>Same for page fault API - they have 3 versions between 2.6.21 and 2.6.26
>- so old kernels will be also killed from support.
>or we will have never tested code in 'stable' tree.
This is no different than the situation today. There are large API
changes in
the upstream kernel (e.g. VFS changes in 2.6.37), and we are always playing
catch-up with them. Through the work of EMC we are close to being in sync
with
the upstream kernel (3.6 at least), and being part of the upstream kernel
will
definitely help avoid this problem. Then, patches to change the API in the
kernel will also be applied to the Lustre code immediately.
Once Lustre is in the upstream kernel, it will eventually get into the
vendor
kernels. It may also be possible to get a backport (i.e. the current
version)
included into a vendor kernel since they are less reluctant to do so for
code
that is already in the upstream kernel.
>reason to use autoconf macroses - we can't trust a RHEL/SuSe kernel
>version/
Sure, we will need this for the out-of-tree Lustre code, as we do today,
but not
inside the kernel. The benefit is that the in-tree code will be the
"right"
code for that particular kernel version, and no macros will be needed.
Cheers, Andreas
>On Dec 19, 2012, at 04:12, Prakash Surya wrote:
>>>> ]unning.
>>>> That won't be an issue with Lustre, but you get the idea.
>>>
>>> So let me just confirm I understand your issue here. For example, the
>>> Linux kernel is at 3.12.1 (version numbers are purely hypothetical and
>>> far into the future) which has Lustre 2.6.3 in it. But RHEL 9's kernel
>>> is only 3.8.1"ish" and has Lustre 2.4.2 (because that's what was in the
>>> upstream Linux 3.8.1 kernel) in it.
>>>
>>> The complaint is that the Lustre (2.4.2) that's in the RHEL 9 kernel
>>> that you are using is too old for some reason?
>>>
>>> So what's the remedy? You can either (a) download the stand-alone
>>> Lustre module source for the version of Lustre you need and build it
>>> against the RHEL 9 kernel you want to use (assuming they are near
>>>enough
>>> to be compatible -- which is the same situation that exists today) or
>>
>> That is a big assumption, and is most likely false. Currently autoconf
>> maintains the compatibility of Lustre with the different kernel
>>versions.
>> If the client is in tree, are we really going to maintain the same level
>> of autoconf "foo" to be compatible with older kernels? And if not, we
>> would have to maintain a second package, outside of the kernel, which
>>did
>> have this autoconf compatibility layer.
>>
>> I don't think option (a) is the same as what we have now.
>>
>> --
>> Cheers, Prakash
>>
>>> (b) you download the entire upstream kernel that has the version of
>>> Lustre that you want to use and build that.
>>>
>>> It seems to me that (a) is what everyone already has to do all of the
>>> time today anyway (assuming the binary RPM that we ship for the given
>>> vendor kernel is not new enough) and (b) just isn't even possible so
>>> that's gravy.
>>>
>>> There might be something obvious that I'm missing, but I'm failing to
>>> see how having Lustre (client) in the kernel is any worse than the
>>> status quo (basically scenario (a) above) and indeed should provide
>>> Lustre compatibility to the current stable Linux sooner than we can
>>> typically turn it around.
>>>
>>> Cheers,
>>> b.
>>>
>
>----------------------------------------------
>Alexey Lyahkov
>alexey_lyashkov(a)xyratex.com
>
>
>_______________________________________________
>Lustre-devel mailing list
>Lustre-devel(a)lists.lustre.org
>http://lists.lustre.org/mailman/listinfo/lustre-devel
>
Cheers, Andreas
--
Andreas Dilger
Lustre Software Architect
Intel High Performance Data Division
9 years, 6 months
Re: [HPDD-discuss] [Lustre-discuss] How smart is Lustre?
by Dilger, Andreas
On 2012-12-19, at 11:22, "Allen, Benjamin S" <bsa(a)lanl.gov<mailto:bsa@lanl.gov>> wrote:
Hi Jason,
2. Having two paths to your storage should speed things up. I'm guessing you'd have more than one LUN on the array, so you could do something as simple as splitting the LUNs between the two paths, or use round robin to balance the traffic between the two paths, etc.
Using round-robin is not a good idea. This will not increase bandwidth (which is already constrained by the disk and bus) but on some RAID controllers will cause severe performance impact.
Cheers, Andreas
9 years, 6 months
Re: [HPDD-discuss] [Lustre-devel] [wc-discuss] Important changes to libcfs primitives usage.
by Dilger, Andreas
On 2012-12-19, at 5:27, "Alexey Lyahkov" <alexey_lyashkov(a)xyratex.com> wrote:
> Nice idea, what about conflicts with names between linux and freebsd?
> both have a atomic but with different arguments
> atomic_set, atomic_add as example.
>
> did you research before make that patch to be sure no conflicts exist?
Yes, it is possible that such conflicts exist. However, it is impossible to know of such conflicts if there is not any code in the Lustre tree that breaks, and no way to test it.
I'm aware that you have been working on a FreeBSD FUSE port at times. Have you ever released the patches somewhere? There never was any code in the Lustre tree for it and I've never heard of any users. Virtually (?) all of the Lustre users are on Linux, and making it easier for Linux users to use Lustre is most important.
I don't want to make it impossible to port to FreeBSD, but if there is a workaround as Xuezhao has proposed, and there are no real users of this code, then the benefits of broader Lustre acceptance outweigh the inconvenience to one or two people developing the FreeBSD FUSE port.
Cheers, Andreas
> On Dec 17, 2012, at 22:34, Andreas Dilger wrote:
>
>> On 2012-12-17, at 10:09 AM, John Hammond wrote:
>>> On 12/05/2012 07:54 AM, Oleg Drokin wrote:
>>>> I just landed first patch of the series to reduce usage of our libcfs_ wrappers for kernel primitives like libcfs_spin_lock/unlock...
>>>> You can see actual change here: http://review.whamcloud.com/#change,2829
>>>>
>>>> It's highly likely that plenty of patches will be affected. To make our job easier, there is a
>>>> build/libcfs_cleanup.sed script included, you can run it on all your .c and .h files to make necessary replacements:
>>>> sed -i -f build/libcfs_cleanup.sed3 `find . -name "*.h" -or -name "*.c"`
>>>>
>>>> Please be also advised that there are more changes like this are coming (timeline is not very clear ATM, we might be able to wait with the rest until
>>>> after feature freeze) and the sed script will be updated accordingly.
>>>
>>> I have been wondering about wrappers and typedefs not affected by this change, for example cfs_get_cpu(), cfs_atomic_read() and cfs_proc_dir_entry_t. In new code and patches should we use the cfs names or their Linux equivalents, get_cpu(), atomic_read(), and struct proc_dir_entry?
>>
>> Ideally, new patches would use the Linux primitives. However, if they are in client-side code that is compiled for liblustre, then the liblustre builds would fail until the wrappers are renamed to their Linux equivalents (i.e. removing "cfs_" prefix).
>>
>> For server-side code and/or llite it should be fine to use the native Linux functions.
>>
>> Cheers, Andreas
>> --
>> Andreas Dilger Whamcloud, Inc.
>> Principal Lustre Engineer http://www.whamcloud.com/
>>
>>
>>
>>
>
> ----------------------------------------------
> Alexey Lyahkov
> alexey_lyashkov(a)xyratex.com
>
>
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel(a)lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
9 years, 6 months
Re: [HPDD-discuss] [Lustre-devel] ZFS OSD
by Dilger, Andreas
On 2012-12-18, at 17:29, John Bent <johnbent(a)gmail.com> wrote:
> Is the code available for the ZFS OSD that's in 2.4? I was curious to
> see what that looks like but couldn't find it online. Perhaps I
> missed it?
John, yes the osd-zfs source code is available in the Lustre git repository, in the master branch only. It was also in the 2.3 release as a preview for OSTs only, while the 2.4 release will be the first one that is functional for the MDT.
The ZFS code itself is available at zfsonlinux.org in various forms.
Cheers, Andreas
9 years, 6 months
Re: [HPDD-discuss] [Lustre-devel] [wc-discuss] Important changes to libcfs primitives usage.
by Dilger, Andreas
On 12/18/12 5:12 PM, "Prakash Surya" <surya1(a)llnl.gov> wrote:
>On Tue, Dec 18, 2012 at 05:12:27PM -0500, Brian J. Murrell wrote:
>> On Mon, 2012-12-17 at 23:12 -0500, Ken Hornstein wrote:
>> > There's a HUGE
>> > difference in practice between "feature X appears in Linux kernel
>> > version Y" and "My RedHat release Z has a particular feature". That's
>> > where life gets complicated. Many times we're stuck on a particular
>> > kernel for various complicated reasons, yet we need to upgrade Lustre
>> > ... or vice versa. It's kind of like Infiniband in the Linux kernel
>>...
>> > at best, it doesn't hurt us, but it's always the "wrong" version (or
>>so
>> > my Infiniband guys tell me). We never end up using the Infiniband in
>> > the kernel, and sometimes (depending on the vagarities of the distro)
>> > that screws us up hard; part of that is because for a long time one
>> > particular distro never would distribute the development symbols for
>> > the Infiniband in the kernel that matched what the kernel was running.
>> > That won't be an issue with Lustre, but you get the idea.
> >
>>
>> So let me just confirm I understand your issue here. For example, the
>> Linux kernel is at 3.12.1 (version numbers are purely hypothetical and
>> far into the future) which has Lustre 2.6.3 in it. But RHEL 9's kernel
>> is only 3.8.1"ish" and has Lustre 2.4.2 (because that's what was in the
>> upstream Linux 3.8.1 kernel) in it.
>>
>> The complaint is that the Lustre (2.4.2) that's in the RHEL 9 kernel
>> that you are using is too old for some reason?
>>
>> So what's the remedy? You can either (a) download the stand-alone
>> Lustre module source for the version of Lustre you need and build it
>> against the RHEL 9 kernel you want to use (assuming they are near enough
>> to be compatible -- which is the same situation that exists today) or
>
>That is a big assumption, and is most likely false. Currently autoconf
>maintains the compatibility of Lustre with the different kernel versions.
>If the client is in tree, are we really going to maintain the same level
>of autoconf "foo" to be compatible with older kernels? And if not, we
>would have to maintain a second package, outside of the kernel, which did
>have this autoconf compatibility layer.
>
>I don't think option (a) is the same as what we have now.
We _already_ have to maintain this autoconf layer for the out-of-kernel
tree, and we've had to do it basically forever. At least if Lustre is in
the kernel, it is _possible_ that the version that is in the vendor
release is "good enough" for a large number of users (probably more than
exist today).
If we align the Lustre "maintenance" branch with the kernel version that
goes into the distro long-term-maintenance kernels (e.g. Lustre 3.0 and
Linux 3.24 for RHEL8) that wouldn't be very different from what we are
doing today, but it would be a lot less hassle for most users. If someone
wants to run a new kernel and new Lustre development release, then they
will get both at the same time.
The corner case is a development version of Lustre with an older vendor
kernel, which would require making the new version of Lustre to build
against the vendor kernel. The in-kernel version of Lustre would no
longer have the autoconf and kernel-portability wrappers, but at least for
a few years there would be an out-of-kernel version of Lustre until it is
available in the vendor kernels (only updated every few years).
Having Lustre in the kernel would also tend to provide a more "stable"
point for the Lustre client and network protocol (as commented elsewhere),
assuming we can line up the Lustre maintenance releases with the kernel
chosen by the vendors. We already need to provide interop and upgrade
support for Lustre, regardless of whether it is in the kernel or not.
It may mean that the "maintenance" version of Lustre is chosen by Red Hat
when they decide on which kernel to use for, say, RHEL8. We would push
Lustre fixes into the "stable" kernel branch used by RHEL8 and/or directly
to RH.
The most difficult part will be the transition, when there is a version of
Lustre in the kernel, but there is still a need to maintain Lustre out of
the tree for the older kernels that do not have it yet. That would
probably take 3-6 years before we could reasonably deprecate the oldest
vendor kernel that does not have any Lustre support in it.
>> (b) you download the entire upstream kernel that has the version of
>> Lustre that you want to use and build that.
>>
>> It seems to me that (a) is what everyone already has to do all of the
>> time today anyway (assuming the binary RPM that we ship for the given
>> vendor kernel is not new enough) and (b) just isn't even possible so
>> that's gravy.
>>
>> There might be something obvious that I'm missing, but I'm failing to
>> see how having Lustre (client) in the kernel is any worse than the
>> status quo (basically scenario (a) above) and indeed should provide
>> Lustre compatibility to the current stable Linux sooner than we can
>> typically turn it around.
>>
>> Cheers,
>> b.
Cheers, Andreas
--
Andreas Dilger
Lustre Software Architect
Intel High Performance Data Division
9 years, 6 months