LUG '13 will take place in San Diego this year at the Omni Hotel in the
Downtown Gaslamp Quarter. LUG Sessions will be held from April 16-18.
Registration is not yet open, but stay tuned for a detailed announcement
from the LUG planning committee coming soon.
The LUG program committee would like to invite members of the Lustre
community to submit presentation abstracts for inclusion in this year's
meeting. It is not necessary to submit a technical paper, just an
abstract of your proposed talk no more than a page in length. Talks
should target a half an hour in length and reflect interesting Lustre
development, application, or practices. The deadline to submit
presentation abstracts is March 4, 2013.
For abstract submission, we will be using the easychair conference system.
The website for LUG 2013 is here:
We're really looking forward to an interesting and exciting program for
this year's meeting in San Diego!
If you have questions or problems, feel free to mail Stephen Simms
(ssimms(a)iu.edu), LUG 2013 program chair.
The LUG '13 Program Committee
today I've run a few tests with a client running the latest git master
(commit 57373a2) manually compiled for my kernel. After mounting my test
file system I noticed that /proc/fs/lustre was completely empty. Has
anyone else noticed this? Is this expected? Did I miss anything?
The client is running RHEL6.3 in case that makes any difference.
Unfortunately I don't have the compile logs any more to check if there
had been any compiler warnings but I didn't notice anything obvious.
Senior Computer Systems Administrator phone: +44 1235 77 8624
Diamond Light Source Ltd. mob: +44 7917 08 5110
(Apologies in advance for the lines below. Some bits are a legal
requirement and I have no control over them.)
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
I am fscking a lustre OST lun, and it shows the following:
[root@oss4 ~]# e2fsck -f -C 0 /dev/mapper/mpath7
e2fsck 1.42.3.wc1 (28-May-2012)
Pass 1: Checking inodes, blocks, and sizes
Inode 192820368, i_blocks is 15512, should be 4294982808. Fix? yes
Pass 2: Checking directory structure
i_blocks_hi for inode 192820368 (/O/0/d0/42065024) is 1, should be zero.
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
nobackup-OST001b: ***** FILE SYSTEM WAS MODIFIED *****
nobackup-OST001b: 1756205/477626368 files (5.6% non-contiguous), 1529484010/1910505472 blocks
I have run fsck twice and the same error with that same inode appeared both times and worries me, why when I say to clear it it doesn't really clear. Any thought on this?
CAEN Advanced Computing
We've been having a devil of a time figuring out what's been going on with
what appears to be a memory leak in the Lustre page cache.
Some specifics: we are running Lustre 2.1.2, on RHEL 6.3, with kernel
2.6.32-279.11.1.el6. The symptoms are we end up with a system with all of
the memory consumed by pages on the inactive free list. Here's what
the relevant lines from /proc/meminfo look like:
MemTotal: 32914040 kB
MemFree: 9382100 kB
Active(file): 33348 kB
Inactive(file): 19620564 kB
What happens is a couple of users seem to be triggering this behavior; exactly
what causes it is still unclear, but one user triggers it just taring up
and removing some files. Eventually the inactive list keeps growing and
the OOM killer will come along, kill off various things in a futile
attempt to get rid of memory, and then the system will need to be rebooted.
I can't claim to be an expert on the Linux VM system, but I decided to
try my best at digging into this some more. I figured out where the
inactive page list was in memory and started looking at the pages using
"crash". There were over 5 million entries on the inactive page list,
so obviously I didn't look at them all; I picked a few at random and looked
at them. What I found was:
- They were all pages owned by Lustre (their backing store pointer was
pointing to Lustre).
- They weren't dirty; they were all marked up-to-date.
>From my reading of the Linux VM code, pages marked with private data (which
is the case here) get their releasepage op called, which in this case is
a pointer to ll_releasepage. I took at look at ll_releasepage, but it wasn't
easy to determine exactly why the page wouldn't be released in this case.
I'm glad to look at anything people could suggest; I was going to instrument
ll_releasepage to see if a) it not releasing pages is the cause of this
problem and b) if it is the problem, to see exactly why it isn't releasing
I took a look at what (I think) was the contents of the pages are, but
it was just binary data, so it wasn't anything helpful to me. I was
thinking that maybe it was directory data (we have one user here who has
directory files that have sizes in the hundreds of megabytes), but it
wasn't obvious to me that it was directory data.
I am new to Lustre. I have Lustre 2.3.0 installed on Centos-6.3(2.6.32-279 kernel) on two nodes.
These nodes are connected via 40Gbps IB interfaces.
Node-1 acting as the Lustre server runs one MGS/MDT (on /dev/sda3) and one OST (on /dev/sda4).
Both /dev/sda3 and sda4 are on a regular scsi disks with 256GB capacity (6Gbps disk speed).
Node-2 is the lustre client.
When I run "dd if=/dev/zero of=/mnt-point-lustre bs=30M count=1"
I get about 550MB/s, which is reasonable. Now, if I change the block-size, count combintaion to
anything less or more than 30M then the performance drops considerably.
What is so magical about 30Megabytes? what parameters can I tune?
Preferably I would like to use bigger file sizes such 1G, 5G, 10G and beyond....
Is it even possible with single OST/MDT combination to get better throughput for bigger files
or I need multiple OSTs etc...
Many thanks in advance for your help.