Lustre and kernel buffer interaction
by John Bauer
I have been trying to understand a behavior I am observing in an IOR
benchmark on Lustre. I have pared it down to a simple example.
The IOR benchmark is running in MPI mode. There are 2 ranks, each
running on its own node. Each rank does the following:
Note : Test was run on the "swan" cluster at Cray Inc., using /lus/scratch
write a file. ( 10GB )
fsync the file
close the file
MPI_barrier
open the file that was written by the other rank.
read the file that was written by the other rank.
close the file that was written by the other rank.
The writing of each file goes as expected.
The fsync takes very little time ( about .05 seconds).
The first reads of the file( written by the other rank ) start out *very
*slowly. While theses first reads are proceeding slowly, the
kernel's cached memory ( the Cached: line in /proc/meminfo) decreases
from the size of the file just written to nearly zero.
Once the cached memory has reached nearly zero, the file reading
proceeds as expected.
I have attached a jpg of the instrumentation of the processes that
illustrates this behavior.
My questions are:
Why does the reading of the file, written by the other rank, wait until
the cached data drains to nearly zero before proceeding normally.
Shouldn't the fsync ensure that the file's data is written to the
backing storage so this draining of the cached memory should be simply
releasing pages with no further I/O?
For this case the "dead" time is only about 4 seconds, but this "dead"
time scales directly with the size of the files.
John
--
John Bauer
I/O Doctors LLC
507-766-0378
bauerj(a)iodoctors.com
7 years, 2 months
quotas on 2.4.3
by Matt Bettinger
Hello,
We have a fresh 2.4.3 lustre upgrade that is not yet put into
production running on rhel 6.4.
We would like to take a look at quotas but looks like there is some
major performance problems with 1.8.9 clients.
Here is how I enabled quotas
[root@lfs-mds-0-0 ~]# lctl conf_param lustre2.quota.mdt=ug
[root@lfs-mds-0-0 ~]# lctl conf_param lustre2.quota.ost=ug
[root@lfs-mds-0-0 ~]# lctl get_param osd-*.*.quota_slave.info
osd-ldiskfs.lustre2-MDT0000.quota_slave.info=
target name: lustre2-MDT0000
pool ID: 0
type: md
quota enabled: ug
conn to master: setup
space acct: ug
user uptodate: glb[1],slv[1],reint[0]
group uptodate: glb[1],slv[1],reint[0]
The quotas seem to be working however the write performance from
1.8.9wc client to 2.4.3 with quotas on is horrific. Am I not setting
quotas up correctly?
I try to make a simple user quota on /lustre2/mattb/300MB_QUOTA directory
[root@hous0036 mattb]# lfs setquota -u l0363734 -b 307200 -B 309200 -i
10000 -I 11000 /lustre2/mattb/300MB_QUOTA/
See quota change is in effect...
[root@hous0036 mattb]# lfs quota -u l0363734 /lustre2/mattb/300MB_QUOTA/
Disk quotas for user l0363734 (uid 1378):
Filesystem kbytes quota limit grace files quota limit grace
/lustre2/mattb/300MB_QUOTA/
310292* 307200 309200 - 4 10000 11000 -
Try and write to quota directory as the user but get horrible write speed
[l0363734@hous0036 300MB_QUOTA]$ dd if=/dev/zero of=301MB_FILE bs=1M count=301
301+0 records in
301+0 records out
315621376 bytes (316 MB) copied, 61.7426 seconds, 5.1 MB/s
Try file number 2 and then quota take effect, so it seems.
[l0363734@hous0036 300MB_QUOTA]$ dd if=/dev/zero of=301MB_FILE2 bs=1M count=301
dd: writing `301MB_FILE2': Disk quota exceeded
dd: closing output file `301MB_FILE2': Input/output error
If I disable quotas using
[root@lfs-mds-0-0 ~]# lctl conf_param lustre2.quota.mdt=none
[root@lfs-mds-0-0 ~]# lctl conf_param lustre2.quota.oss=none
Then try and write the same file the speeds are more like we expect
but then can't use quotas.
[l0363734@hous0036 300MB_QUOTA]$ dd if=/dev/zero of=301MB_FILE2 bs=1M count=301
301+0 records in
301+0 records out
315621376 bytes (316 MB) copied, 0.965009 seconds, 327 MB/s
[l0363734@hous0036 300MB_QUOTA]$ dd if=/dev/zero of=301MB_FILE2 bs=1M count=301
I have not tried this with a 2.4 client, yet since all of our nodes
are 1.8.X until we rebuild our images.
I was going by the manual on
http://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact...
but it looks like I am running into interoperability issue (which I
thought I fixed by using 1.8.9-wc client) or just not configuring
this correctly.
Thanks!
MB
7 years, 5 months
New liblustreapi ?
by Simmons, James A.
Now that lustre 2.7 is coming up soon I like to open the discussion
on one of the directions we could go. Recently several projects have sprung
up that impact liblustreapi. During one of those discussion the idea of a new
liblustreapi was brought up. A liblustreapi 2.0 you could say. So I like to
get feel in the community about this. If people want this proposal I like to
recommend that we gradually build this new library along side the original
liblustreapi and link it when necessary to the lustre utilities. First I
like the discussion of using the LGPL license with this new library. I look
forward to the feed back.
7 years, 6 months
Problem mounting clients after update to 2.5.2
by Oliver Mangold
Hi,
I updated a Lustre installation from Lustre 2.1.6 to 2.5.2. Now
something seems to be wrong with the llogs. After doing writeconf on all
targets I can mount a client *once*, but after unmounting the client
again, all subsequent tries to mount clients result in errors like this:
2014-07-31T12:57:12.198389+02:00 l3mds1 <kern.err> kernel:LustreError:
68983:0:(obd_mount.c:1323:lustre_fill_super()) Unable to mount (-5)
2014-07-31T12:57:13.198412+02:00 l3mds1 <kern.err> kernel:LustreError:
15c-8: MGC10.3.6.21@o2ib: The configuration from log 'lustre3-client'
failed (-5). This may be the result of communication errors between this
node and the MGS, a bad configuration, or other errors. See the syslog
for more information.
2014-07-31T12:57:13.198428+02:00 l3mds1 <kern.err> kernel:LustreError:
69606:0:(llite_lib.c:1046:ll_fill_super()) Unable to process log: -5
Any ideas how to fix this?
--
Dr. Oliver Mangold
System Analyst
NEC Deutschland GmbH
HPC Division
Hessbrühlstraße 21b
70565 Stuttgart
Germany
Phone: +49 711 78055 13
Mail: oliver.mangold(a)emea.nec.com
7 years, 11 months
Moving the MDT performing File-level backups
by Ramiro Alba
Hello everybody,
I am currently using Lustre 1.8.5 at servers, with a SLES kernel
(2.6.32.19-0.2.1-lustre.1.8.5).
Before this year is over, I'll upgrade to the current Lustre Maintenance
release
(currently 2.4.X), but now I need to arrange a MDT moving, to other LUN
in the MDS.
I did a test on a MDT LVM snapshot, following the moving procedure
described bellow, but I
found some issues I would like to comment:
1) The backup using the tar command took very long time (20 hours),
though the
MDT is quite small (197 MB). So long?
2) The restored 'ldiskfs' file system is a bit smaller than the original
one (about 20 MB
using du -sm). Should I worry?
3) When backing up Extended attributes with the command 'getfattr' I get
some errors of
the type:
getfattr: ./ROOT/<file path>: No such file or directory
I could see that they are symbolic links using absolute paths. Can
that be a problem?
Finally, I got the bellow procedure from the 1.8.X Lustre manual. Any
comment?
**************************************************************
MDT MOVING PROCEDURE
**************************************************************
----------------------------------------------------------
- Backup procedure
----------------------------------------------------------
1) Mount lustre as 'ldiskfs' type
mount -t ldiskfs /dev/sdb /lustre/mds
2) Change to the file system mount point
cd /lustre/mds
3) Backup all the file system data
tar cSf /backup/mds.tar .
4) Backup Extended Attributes
getfattr -R -d -m '.*' -P . > /backup/ea.bak
5) Umount the file system
cd
umount /lustre/mds
----------------------------------------------------------
- Restore procedure
----------------------------------------------------------
1) Make a receiving lustre MDT
mkfs.lustre --fsname=jffstg --param mdt.quota_type=ug \
--reformat --mdt --mgs /dev/sdc
2) Mount lustre as 'ldiskfs' type
mount -t ldiskfs /dev/sdc /lustre/mds
3) Change to the file system mount point
cd /lustre/mds
4) Restore the previous tar file
tar xpSf /backup/mds.tar
5) Restore de file system Extended Attributes
setfattr --restore=/backup/ea.bak
6) Remove the recovery logs
rm OBJECTS/* CATALOGS
--
Ramiro Alba
Centre Tecnològic de Tranferència de Calor
http://www.cttc.upc.edu
Escola Tècnica Superior d'Enginyeries
Industrial i Aeronàutica de Terrassa
Colom 11, E-08222, Terrassa, Barcelona, Spain
Tel: (+34) 93 739 8928
--
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que est� net.
7 years, 11 months
Lustre 2.6 on Ubuntu 14
by Patrice Hamelin
Hi,
I am trying to compile a Lustre 2.6 client on ubuntu 14.04, kernel
3.13.0 and Mellanox OFED 2.2-1. I saw that developpers work hard to get
it supported, what is the status of this? Is it feasible? I did not
succeed yet.
Thanks.
--
Patrice Hamelin
Specialiste sénior en systèmes d'exploitation | Senior OS specialist
Environnement Canada | Environment Canada
Dorval, QC
Gouvernement du Canada | Government of Canada
7 years, 11 months
slow nid removal?
by Michael Di Domenico
I'm running lustre 2.4.3 on rhel6. I have 16 infiniband ipoib nids
setup on my lustre servers. the networking seems to work fine.
however, when i remove the nids using 'lctl net unconfigure' it seems
to take 20-22 seconds to remove each one. there are no errors showing
and the command works just fine, i'm just curious why it's taking so
long.
there are no other modules loaded, commands i ran are as follows
modprobe lnet
lctl net configure
lctl net up
do some lctl pings
lctl net down
lctl net unconfigure
lustre_rmmod
the unconfigure step seems to take 20-22 secs per nid, everything else is quick
any thoughts?
thanks
- michael
7 years, 11 months
Re: [HPDD-discuss] [Lustre-devel] lustre-1.8.8: rdma_listen() backlog 0 breaks iWARP
by Dilger, Andreas
On 2014/07/23, 9:44 AM, "Steve Wise" <swise(a)opengridcomputing.com> wrote:
>Hello,
>
>I'm trying to get lustre-1.8.8/RHEL6 running over Chelsio iWARP RNICs and
>connection setup
>is failing at the server due to kiblnd_startup() calling rdma_listen()
>with a backlog of
>0. This effectively rejects all incoming connection requests. I looked
>at lustre-1.8.7,
>and the backlog was 256 in that release.
>
>Q: Why was it changed to 0?
Since I'm not familiar with the LNET code myself, I'd recommend to check
the
commit messages in Git to see if there is an explanation, or in the linked
Jira/Bugzilla ticket.
You may also want to see if this is fixed with the 1.8.9 release.
Cheers, Andreas
--
Andreas Dilger
Lustre Software Architect
Intel High Performance Data Division
7 years, 11 months