Lustre and kernel buffer interaction
by John Bauer
I have been trying to understand a behavior I am observing in an IOR
benchmark on Lustre. I have pared it down to a simple example.
The IOR benchmark is running in MPI mode. There are 2 ranks, each
running on its own node. Each rank does the following:
Note : Test was run on the "swan" cluster at Cray Inc., using /lus/scratch
write a file. ( 10GB )
fsync the file
close the file
MPI_barrier
open the file that was written by the other rank.
read the file that was written by the other rank.
close the file that was written by the other rank.
The writing of each file goes as expected.
The fsync takes very little time ( about .05 seconds).
The first reads of the file( written by the other rank ) start out *very
*slowly. While theses first reads are proceeding slowly, the
kernel's cached memory ( the Cached: line in /proc/meminfo) decreases
from the size of the file just written to nearly zero.
Once the cached memory has reached nearly zero, the file reading
proceeds as expected.
I have attached a jpg of the instrumentation of the processes that
illustrates this behavior.
My questions are:
Why does the reading of the file, written by the other rank, wait until
the cached data drains to nearly zero before proceeding normally.
Shouldn't the fsync ensure that the file's data is written to the
backing storage so this draining of the cached memory should be simply
releasing pages with no further I/O?
For this case the "dead" time is only about 4 seconds, but this "dead"
time scales directly with the size of the files.
John
--
John Bauer
I/O Doctors LLC
507-766-0378
bauerj(a)iodoctors.com
7 years, 2 months
quotas on 2.4.3
by Matt Bettinger
Hello,
We have a fresh 2.4.3 lustre upgrade that is not yet put into
production running on rhel 6.4.
We would like to take a look at quotas but looks like there is some
major performance problems with 1.8.9 clients.
Here is how I enabled quotas
[root@lfs-mds-0-0 ~]# lctl conf_param lustre2.quota.mdt=ug
[root@lfs-mds-0-0 ~]# lctl conf_param lustre2.quota.ost=ug
[root@lfs-mds-0-0 ~]# lctl get_param osd-*.*.quota_slave.info
osd-ldiskfs.lustre2-MDT0000.quota_slave.info=
target name: lustre2-MDT0000
pool ID: 0
type: md
quota enabled: ug
conn to master: setup
space acct: ug
user uptodate: glb[1],slv[1],reint[0]
group uptodate: glb[1],slv[1],reint[0]
The quotas seem to be working however the write performance from
1.8.9wc client to 2.4.3 with quotas on is horrific. Am I not setting
quotas up correctly?
I try to make a simple user quota on /lustre2/mattb/300MB_QUOTA directory
[root@hous0036 mattb]# lfs setquota -u l0363734 -b 307200 -B 309200 -i
10000 -I 11000 /lustre2/mattb/300MB_QUOTA/
See quota change is in effect...
[root@hous0036 mattb]# lfs quota -u l0363734 /lustre2/mattb/300MB_QUOTA/
Disk quotas for user l0363734 (uid 1378):
Filesystem kbytes quota limit grace files quota limit grace
/lustre2/mattb/300MB_QUOTA/
310292* 307200 309200 - 4 10000 11000 -
Try and write to quota directory as the user but get horrible write speed
[l0363734@hous0036 300MB_QUOTA]$ dd if=/dev/zero of=301MB_FILE bs=1M count=301
301+0 records in
301+0 records out
315621376 bytes (316 MB) copied, 61.7426 seconds, 5.1 MB/s
Try file number 2 and then quota take effect, so it seems.
[l0363734@hous0036 300MB_QUOTA]$ dd if=/dev/zero of=301MB_FILE2 bs=1M count=301
dd: writing `301MB_FILE2': Disk quota exceeded
dd: closing output file `301MB_FILE2': Input/output error
If I disable quotas using
[root@lfs-mds-0-0 ~]# lctl conf_param lustre2.quota.mdt=none
[root@lfs-mds-0-0 ~]# lctl conf_param lustre2.quota.oss=none
Then try and write the same file the speeds are more like we expect
but then can't use quotas.
[l0363734@hous0036 300MB_QUOTA]$ dd if=/dev/zero of=301MB_FILE2 bs=1M count=301
301+0 records in
301+0 records out
315621376 bytes (316 MB) copied, 0.965009 seconds, 327 MB/s
[l0363734@hous0036 300MB_QUOTA]$ dd if=/dev/zero of=301MB_FILE2 bs=1M count=301
I have not tried this with a 2.4 client, yet since all of our nodes
are 1.8.X until we rebuild our images.
I was going by the manual on
http://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact...
but it looks like I am running into interoperability issue (which I
thought I fixed by using 1.8.9-wc client) or just not configuring
this correctly.
Thanks!
MB
7 years, 5 months
New liblustreapi ?
by Simmons, James A.
Now that lustre 2.7 is coming up soon I like to open the discussion
on one of the directions we could go. Recently several projects have sprung
up that impact liblustreapi. During one of those discussion the idea of a new
liblustreapi was brought up. A liblustreapi 2.0 you could say. So I like to
get feel in the community about this. If people want this proposal I like to
recommend that we gradually build this new library along side the original
liblustreapi and link it when necessary to the lustre utilities. First I
like the discussion of using the LGPL license with this new library. I look
forward to the feed back.
7 years, 6 months
lustre 2.4.3 on trusty
by Patrice Hamelin
Hi,
is there any chance I can compile Lustre client 2.4.3 on Ubuntu 14.04
Trusty or am I fighting with the wind?
Thanks.
--
Patrice Hamelin
Specialiste sénior en systèmes d'exploitation | Senior OS specialist
Environnement Canada | Environment Canada
2121, route Transcanadienne | 2121 Transcanada Highway
Dorval, QC H9P 1J3
Téléphone | Telephone 514-421-5303
Télécopieur | Facsimile 514-421-7231
Gouvernement du Canada | Government of Canada
7 years, 9 months
LU-4185: Incorrect permission handling when creating existing directories
by Patrick Farrell
I'm curious about LU-4185. To recap, when a user attempts to create a
directory which exists when do they do not have permission to do so,
standard behavior is to return -EEXIST - Not -ENOPERM. (I am not sure if
this is POSIX, but it is standard on other file systems)
Depending on the state of the client cache, Lustre will sometimes return
ENOPERM. Not only is this non-standard behavior, it seems very
undesirable that the value returned for an operation in a directory
would change depending on whether or not you've read the directory
contents recently.
This was brought up some time ago, and the fix
(http://review.whamcloud.com/#/c/8257/) stalled because it would harm
open/create performance.
It would be nice to have this addressed:
Is this issue that Intel expects to fix, either in the way of the patch
above, or some other way, or is this a minor divergence from POSIX that
Intel's planning to keep in order to achieve better performance?
Thanks,
- Patrick Farrell
7 years, 10 months
Re: [HPDD-discuss] Lustre staging driver cleanup
by Drokin, Oleg
Hello!
On Aug 24, 2014, at 6:34 AM, Rita Sinha wrote:
> This is a query regarding the Lustre staging driver cleanup especially
> cleaning up the libcfs layer.
>
> The TODOs say "Ideally we can remove include/linux/libcfs entirely."
>
> I request some guidance regarding how to get started with the same.
> Any help would be appreciated.
Majority of libcfs in its present form was envisioned as a platform independent HAL that would
allow lustre to be built on multiple platforms like MacOS X, Windows and also as a userspace library.
None of that is of any interest in the kernel tree, so it could be considerably winded down and
the functionality brought in place in the rest of the lustre tree.
I suspect all of libcfs will not really go away in the end, but significant chunks still can.
Hopefully that helps.
Bye,
Oleg
7 years, 10 months
Is 2.4.3 -> 2.5.x a valid upgrade path?
by E.S. Rosenberg
Can I do the "first upgrade client then server" strategy when moving from
lustre 2.4.3 to 2.5.x or is it too big an upgrade (ie. is the 2.5.x client
compatible with the 2.4.3 server)?
It looks like we may anyhow redo the MDS/ODS systems when the full upgrade
of our system is in progress, but I am thinking that for testing purposes I
can already move the existing lustre to 2.5.x ahead of time....
Thanks,
Eli
7 years, 10 months
Fwd: Lustre staging driver cleanup
by Rita Sinha
---------- Forwarded message ----------
From: Rita Sinha <rita.sinha89(a)gmail.com>
Date: Sun, Aug 24, 2014 at 4:04 PM
Subject: Lustre staging driver cleanup
To: andreas.dilger(a)intel.com, Oleg Drokin <oleg.drokin(a)intel.com>
Cc: hpdd-discuss(a)lists.01.org, greg(a)kroah.com
Hi,
This is a query regarding the Lustre staging driver cleanup especially
cleaning up the libcfs layer.
The TODOs say "Ideally we can remove include/linux/libcfs entirely."
I request some guidance regarding how to get started with the same.
Any help would be appreciated.
Regards,
Rita Sinha
7 years, 10 months